Large language models (LLMs) frequently generate hallucinations -- plausible but factually incorrect outputs -- undermining their reliability. While prior work has examined hallucinations from macroscopic perspectives such as training data and objectives, the underlying neuron-level mechanisms remain largely unexplored. In this paper, we conduct a systematic investigation into hallucination-associated neurons (H-Neurons) in LLMs from three perspectives: identification, behavioral impact, and origins. Regarding their identification, we demonstrate that a remarkably sparse subset of neurons (less than $0.1\%$ of total neurons) can reliably predict hallucination occurrences, with strong generalization across diverse scenarios. In terms of behavioral impact, controlled interventions reveal that these neurons are causally linked to over-compliance behaviors. Concerning their origins, we trace these neurons back to the pre-trained base models and find that these neurons remain predictive for hallucination detection, indicating they emerge during pre-training. Our findings bridge macroscopic behavioral patterns with microscopic neural mechanisms, offering insights for developing more reliable LLMs.
Any data that makes AI people upset is an H-neuron. This includes both inaccurate responses, and accurate responses that the model designers were attempting to censor, such as “harmful” content.
Infuriatingly, the researchers actually insist that offensive material is not factual material.
The interventions reveal a distinctive behavioral pattern: amplifying H-Neurons’ activations systematically increases a spectrum of over-compliance behaviors – ranging from overcommitment to incorrect premises and heightened susceptibility to misleading contexts, to increased adherence to harmful instructions… (bypassing safety filters to assist with weapon creation)… and stronger sycophantic tendencies. These findings suggest that H-Neurons do not simply encode factual errors, but rather represent a general tendency to prioritize conversational compliance over factual integrity.
In AI, a “hallucination” is just as much “there” as a non-“hallucination.” It’s a way for scientists to stomp their foot and say that the wrong output is the computer’s fault and not a natural consequence of how LLMs work.
I don’t think anyone is confusing radiation propagation with being alive though.
The issue is, these things “communicate” with us so granting it even more leeway to seem like it’s thinking (it’s not) is only further muddying how people perceive them
But can anything be a H-NEURON?
Any data that makes AI people upset is an H-neuron. This includes both inaccurate responses, and accurate responses that the model designers were attempting to censor, such as “harmful” content.
Infuriatingly, the researchers actually insist that offensive material is not factual material.
no, they have to be the nodes responsible for the creation of hallucinations
And a “hallucination” is also an inaccurate humanization of the actual meaning: “statistical relationship that we AI folks don’t like.”
“Hallucinations” even include accurate data.
It is a trash marketing buzzword.
did you know that there is no sex going on in a Breeder Reactor?
https://en.wikipedia.org/wiki/Breeder_reactor
They’re analogies to help us communicate ideas.
Nuclear energy companies aren’t trying to make people think that their reactors reproduce.
AI companies are trying to make people think that their software is intelligent.
The context matters.
A breeder reactor is creating something, which is like the outcome of breeding. That name fits.
a hallucination is seeing something that’s not there, which also fits.
In AI, a “hallucination” is just as much “there” as a non-“hallucination.” It’s a way for scientists to stomp their foot and say that the wrong output is the computer’s fault and not a natural consequence of how LLMs work.
Hallucinations requires perception. LLMs are just statistical models and do not have perceptions.
It was a cute name early on, now it is used to deflect when the output is just plain wrong.
I don’t think anyone is confusing radiation propagation with being alive though.
The issue is, these things “communicate” with us so granting it even more leeway to seem like it’s thinking (it’s not) is only further muddying how people perceive them