How Much Do LLMs Hallucinate in Document Q&A Scenarios? A 172-Billion-Token Study Across Temperatures, Context Lengths, and Hardware Platforms [TLDR: 25%]

RandAlThor@lemmy.ca · edit-2 2 months ago

How Much Do LLMs Hallucinate in Document Q&A Scenarios? A 172-Billion-Token Study Across Temperatures, Context Lengths, and Hardware Platforms [TLDR: 25%]

Womble@piefed.world · 2 months ago

I wouldnt read too much into the lower scores, they include some absolutely tiny models. The one 70% lower than the top score at 24% correct is a 1B model from 2024. Honestly that it can do any information retrival from a 32k context is impressive.