Large language models (LLMs) frequently generate hallucinations -- plausible but factually incorrect outputs -- undermining their reliability. While prior work has examined hallucinations from macroscopic perspectives such as training data and objectives, the underlying neuron-level mechanisms remain largely unexplored. In this paper, we conduct a systematic investigation into hallucination-associated neurons (H-Neurons) in LLMs from three perspectives: identification, behavioral impact, and origins. Regarding their identification, we demonstrate that a remarkably sparse subset of neurons (less than $0.1\%$ of total neurons) can reliably predict hallucination occurrences, with strong generalization across diverse scenarios. In terms of behavioral impact, controlled interventions reveal that these neurons are causally linked to over-compliance behaviors. Concerning their origins, we trace these neurons back to the pre-trained base models and find that these neurons remain predictive for hallucination detection, indicating they emerge during pre-training. Our findings bridge macroscopic behavioral patterns with microscopic neural mechanisms, offering insights for developing more reliable LLMs.
In AI, a “hallucination” is just as much “there” as a non-“hallucination.” It’s a way for scientists to stomp their foot and say that the wrong output is the computer’s fault and not a natural consequence of how LLMs work.
I don’t think anyone is confusing radiation propagation with being alive though.
The issue is, these things “communicate” with us so granting it even more leeway to seem like it’s thinking (it’s not) is only further muddying how people perceive them
no, they have to be the nodes responsible for the creation of hallucinations
And a “hallucination” is also an inaccurate humanization of the actual meaning: “statistical relationship that we AI folks don’t like.”
“Hallucinations” even include accurate data.
It is a trash marketing buzzword.
did you know that there is no sex going on in a Breeder Reactor?
https://en.wikipedia.org/wiki/Breeder_reactor
They’re analogies to help us communicate ideas.
Nuclear energy companies aren’t trying to make people think that their reactors reproduce.
AI companies are trying to make people think that their software is intelligent.
The context matters.
A breeder reactor is creating something, which is like the outcome of breeding. That name fits.
a hallucination is seeing something that’s not there, which also fits.
In AI, a “hallucination” is just as much “there” as a non-“hallucination.” It’s a way for scientists to stomp their foot and say that the wrong output is the computer’s fault and not a natural consequence of how LLMs work.
Hallucinations requires perception. LLMs are just statistical models and do not have perceptions.
It was a cute name early on, now it is used to deflect when the output is just plain wrong.
I don’t think anyone is confusing radiation propagation with being alive though.
The issue is, these things “communicate” with us so granting it even more leeway to seem like it’s thinking (it’s not) is only further muddying how people perceive them