Researchers have found the cause of hallucinations in LLMs, H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs

Allah@piefed.world · 3 months ago

Researchers have found the cause of hallucinations in LLMs, H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs

XLE@piefed.social · 3 months ago

Any data that makes AI people upset is an H-neuron. This includes both inaccurate responses, and accurate responses that the model designers were attempting to censor, such as “harmful” content.

Infuriatingly, the researchers actually insist that offensive material is not factual material.

The interventions reveal a distinctive behavioral pattern: amplifying H-Neurons’ activations systematically increases a spectrum of over-compliance behaviors – ranging from overcommitment to incorrect premises and heightened susceptibility to misleading contexts, to increased adherence to harmful instructions… (bypassing safety filters to assist with weapon creation)… and stronger sycophantic tendencies. These findings suggest that H-Neurons do not simply encode factual errors, but rather represent a general tendency to prioritize conversational compliance over factual integrity.