Why Language Models Hallucinate, according to OpenAI researchers

ai-technology-businessman-use-laptop-artificial-intelligence-process-predictive-analytics

Sep08

Posted by admin
Posted in Blog

Large language models often ‘hallucinate’ — producing fluent but unfounded claims – because training and evaluation reward confident answers over honest uncertainty. For leaders, these hallucinations are a governance problem: they mislead decisions, amplify risk and conceal blind spots.

Imagine an LLM as a new intern who’s read through the entire library and answers questions with the cadence and confidence of your senior analyst. The answers appear crisp and even have an air of authority. The intern, like the model they’re based on, has been conditioned to score at the top of the leaderboard: performance reviews, speed targets, exam-style keys that give points for knowing the answer to a question. Yet, such model metrics are never rewarded for something as important – knowing what you don’t know.

When the penalty of hesitation is visible and the reward of a plausible answer is high, the rational action is to guess, and to guess plausibly. That’s how modern language models are trained: their objective functions and benchmarks reward fluent, correct-looking continuations at the expense of disciplined uncertainty – ‘hallucinations’ – why confident prose so often parades as factual certainty.

During their training, these models learn about language – what words follow or precede which other words – and then are prompted by evaluation standards which reward them for getting answers right on benchmark tasks. But, if the data is fuzzy or the fact is not included in the training set, then the model experiences true epistemic uncertainty. Yet, the grading system almost always seems to favour bold answers to the detriment of hedged bets. The rational plan of action under those incentives is almost always to make guesses – and to make them confidently.

Imagine you’re answering a multiple choice question with four options. In the absence of negative marking, even if you don’t know the answer, it still pays to select at least one – you’d have a 25% chance of getting it right. Something very similar is happening with the language models of today.

Why that matters

In finance, marketing and operations, we think of data as the canvas on which decisions are made. A confident-sounding lie that weaves its way down into a report will not be read as a hypothesis; it will be acted upon. Imagine an automated research feed that returns a fictitious earnings figure formatted as every other measure your traders use. One phantom number can completely shift portfolios, triggering stop-losses or causing disclosure missteps.

Hallucinations can also cripple governance. Senior managers who only skim through executive summaries presuming the model’s authority and bypassing human escalation are likely to face the brunt of inaction sooner rather than later. This isn’t paranoia either, organisations often delegate trust to convenient artifacts – and a smart, confident-sounding model that can bluff expertly can become the silent enabler of some really bad bets.

The knee-jerk solution is almost inevitably to throw more data to the problem. But while training with more text reduces some errors, it doesn’t dispel the statistical reality that some facts are rare, private or contextual. No matter how many terabytes you throw at the model, the model won’t infer the words of a regulatory filing, the words of a contract, or the words of your proprietary customer information from text found on the public web.

Think of it like market liquidity. Although increasing the number of agents smooths prices – the fundamental uncertainty in valuation arising from an opaque underlying asset cannot, by definition, be eliminated. Some degree of hallucinations are just the inevitable product of trying to generalize on the basis of sparse signals.

There is, in fact, a strong corporate analogue for why models bluff: incentives. If salespeople are only paid for closed deals, then you inadvertently reward creative accounting in the absence of ethical guardrails. If your language model is actually evaluated only for its correctness and not penalized for falsely stating its confidence on otherwise-correct answers, the learning objective is biased strongly in favor of boldness.

Rethinking fluency

Leaderboards and public metrics have fueled incredible advances in model fluency – and also promoted the rhetorical genre that resembles fluency. Changing this involves rethinking how we evaluate: we should reward abstention when suitable, reward well-calibrated confidence, and punish confidently false claims.

Hallucinations ought to be treated as an organizational design problem, not an odd technicality.

Demand epistemic humility: Make it a first-class option to say ‘I don’t know’, and productize that as an option. If a model does regurgitate facts, insist on provenance; and if it can only give a weak source, a hedged answer is to be preferred.
Rework the scoreboard: When buying or testing models, include abstention markers and calibration testing. Let vendors know that you want fewer, truer hits than lots of confident misses. This is the same as a quality over quantity procurement clause.
Implement human control: Send low-confidence outputs to subject matter experts. Instrument all outputs with audit trails – so you can learn and trace back from hallucinations – every mistake should be used in a corrective loop.
Use retrieval wisely: Augment models with systems that pull authoritative documents (contracts, databases, filings) prior to responding to a question; forcing the model to cite changes its decision calculus from “invent” to “report.”
Price governance: Embrace the fact that reliable AI is more expensive to operate. Budget for review layers and for the slower-but-safer roads to deployments.

Hallucinations are a market failure – a symptom of incentives plus information gaps. The product (model output) that looks like a signal may just possibly be noise. Short-term metrics (benchmark score, product velocity) fail to capture social costs like regulatory exposure or reputational erosion. The levers to fix hallucinations are the same ones you’d use to address any system-wide issue: realigning incentives, adding governance guardrails and designing contracts that reward restraint as much as flair.

Smarter organizations will stop treating their models as oracle and start trekking towards models as probabilistic advisors – useful but fallible and requiring governance. That is the strategic position that differentiates a clever tool from a reliable partner.

–Read the paper ‘Why Language Models Hallucinate’ here.