45% of answers are wrong: the study proving that AI hasn’t (yet) learned rigor

📊 A study that cools the hype
A large-scale analysis conducted by the BBC, the European Broadcasting Union, and several European public media outlets has revealed a troubling fact: nearly one out of every two answers generated by large language models (LLMs) — ChatGPT, Gemini, Copilot, Perplexity — contains a significant error.

Out of 3,000 answers tested across 18 languages:

  • 31% showed sourcing issues (missing, misleading, or incorrect citations),
  • 20% contained factual inaccuracies,
  • and some tools, like Gemini, produced errors in nearly 3 out of 4 responses.

These results confirm what AI engineers have long admitted: hallucinations aren’t a bug, they’re a feature of the system itself.
Even OpenAI acknowledged recently that : its models predict words, not truth.

📚 A problem already visible in the real world
Remember the U.S. case where several lawyers were sanctioned for citing court decisions that… never existed, invented entirely by an AI.
A simple copy-paste, a skipped verification, and suddenly the line between assistance and negligence vanished.

The problem isn’t the tool.
It’s the illusion of reliability it creates.
These models speak with confidence, even when they’re wrong. And the more convincing they sound, the greater the temptation to trust them blindly.

⚖️ And in knowledge professions?
For roles built on precision, the real question isn’t “Can we use AI?” but “How do we avoid giving up our rigor to it?”

AI can help us structure, summarize, and explore.
But it cannot verify.
It doesn’t yet grasp nuance, context, or the legal and ethical implications of a misinterpreted word.

Even systems using Retrieval-Augmented Generation (RAG), which ground their answers in verified data sources aren’t immune. If the base data is incomplete or biased, the result can still mislead.

🧠 The key takeaway
Large language models are improving, some are even learning to say “I don’t know.”
But human verification remains the only real safeguard.
When in doubt, double-checking is still our best professional instinct.
AI can assist reasoning, but it cannot replace judgment.

The machine predicts; humans understand.
And that difference, for now, is exactly what makes our insight indispensable.

 

 

Sources :

https://www.ebu.ch/fr/news/2025/10/ai-s-systemic-distortion-of-news-is-consistent-across-languages-and-territories-international-study-by-public-service-broadcaste

https://openai.com/fr-FR/index/why-language-models-hallucinate/

Facebook
Pinterest
Twitter
LinkedIn

Leave a Reply

Your email address will not be published. Required fields are marked *

Latest Post