4.4.2024

What to Do When AI Hallucinates?

Learn more about the challenge of minimizing AI hallucinations. Industry experts are looking at innovative solutions to ensure more accurate and reliable AI-generated content. An expert article by Helmut van Rinsum.

Helmut van Rinsum

Guest Author & AI Expert

Artificial Intelligence

GPTs regularly provide incorrect answers, which can be particularly embarrassing for companies. How can the risk of hallucinations be minimized?

Yann LeCunn, Chief AI Scientist at Meta, recently hit the nail on the head: "Large language models have no understanding of the reality they aim to describe through language," he told the esteemed tech magazine IEEE Spectrum. "These systems simply generate texts that are grammatically and semantically coherent."

The acclaimed AI scientist described AI hallucinations, which anyone who uses a Generative AI like ChatGPT has encountered. It invents facts and inaccurate information, delivering wrong answers. However, this is only sometimes immediately apparent because the Large Language Model (LLM) presents them with the usual authority and factual tone.

Studies have documented how frequently this phenomenon occurs. In a survey by the platform provider Aporia among about 1,000 ML experts, 98 percent reported that their models show signs of hallucinations. In a study by tech service provider Tidio with nearly 1,000 internet users, 86 percent said they had received incorrect answers from ChatGPT. This demonstrates how widespread and thus unreliable standard systems are, a fact also confirmed by the disclaimer ChatGPT standardly includes below its input field: "ChatGPT can make mistakes. Consider checking important information."

Hallucinations occur because LLMs calculate probabilities

The reasons for hallucinating are relatively mundane. LLMs calculate probabilities for their responses – letter by letter, word by word. It's about what might follow next. "This means that LLMs always generate an answer, although no fact-checking by experts takes place," explains AI expert Christoph Haas from Bitsero. "They produce texts without knowing whether the facts or the underlying logic are correct," emphasizes Julien Siebert, Senior AI Expert at Fraunhofer IESE. "For this reason, they are sometimes also referred to as 'statistical parrots.'"

Another cause can be the data basis used to train the models. If it contains false information, the LLM tends to reproduce it. This is evident in biased answers, which appear despite all efforts by GenAI providers to prevent them. Most of the training data comes from the Western world, explains Haas, leading to "overfitting." The result is predominantly Western responses. Haas: "Information and perspectives from emerging and developing countries are underrepresented."

False or biased answers can be particularly problematic when users communicate not with ChatGPT from OpenAI or Gemini from Google but with a company. In other words, when the sender stands for the answers' accuracy, the expectations differ. Because a coherent customer experience also includes correct product information and answers to inquiries, companies must consider how to prevent or at least minimize hallucinations to avoid disappointments.

How can AI hallucinations be eliminated? A leap through quantum computing

One strategy to prevent hallucinations is to review regularly and thus further train standardized answers. Another is to connect the language model with a knowledge database or other tools to enable fact checks and incorporate these insights. This includes techniques such as "Retrieval Augmented Generation," where text generation is enriched with information from private or proprietary sources. Another is the Chain of Thoughts: Here, the LLM is asked to outline the intermediate steps of its reasoning.

Frontnow Advisor, an AI-driven virtual consultant for e-commerce customers, uses LLMs to understand language and provide grammatically and semantically correct answers. However, it solely relies on data stored in the shop provided by the customer. This can reduce error tolerance to nearly zero, and our customers' ethical and legal guidelines can be considered, says Marc Funk, CEO and co-founder of Frontnow.

But can language models be developed to a point where hallucinating is entirely ruled out one day? Given the complexity and dynamics of language and the constantly changing landscape of information, this is a huge challenge, stresses Christoph Haas. Similar to human thinking, where misinformation and misunderstandings also occur, a particular susceptibility to error in AI is inevitable. AI expert Siebert can at least imagine that the neural networks behind LLMs might one day be able to perform fact checks much faster than anticipated.

Further technical progress could also bring improvements. Computational power plays a crucial role in optimizing models; after all, advances in hardware have made developments like ChatGPT possible. A further breakthrough could now be achieved with the leap into the era of quantum computing. Haas: "This could achieve a computing capacity that comes closer to the human brain's performance and redefines the limits of AI research." This means that AI hallucinations could be significantly reduced once more.

‍

GPTs regularly provide incorrect answers, which can be particularly embarrassing for companies. How can the risk of hallucinations be minimized?

Yann LeCunn, Chief AI Scientist at Meta, recently hit the nail on the head: "Large language models have no understanding of the reality they aim to describe through language," he told the esteemed tech magazine IEEE Spectrum. "These systems simply generate texts that are grammatically and semantically coherent."

The acclaimed AI scientist described AI hallucinations, which anyone who uses a Generative AI like ChatGPT has encountered. It invents facts and inaccurate information, delivering wrong answers. However, this is only sometimes immediately apparent because the Large Language Model (LLM) presents them with the usual authority and factual tone.

Studies have documented how frequently this phenomenon occurs. In a survey by the platform provider Aporia among about 1,000 ML experts, 98 percent reported that their models show signs of hallucinations. In a study by tech service provider Tidio with nearly 1,000 internet users, 86 percent said they had received incorrect answers from ChatGPT. This demonstrates how widespread and thus unreliable standard systems are, a fact also confirmed by the disclaimer ChatGPT standardly includes below its input field: "ChatGPT can make mistakes. Consider checking important information."

Hallucinations occur because LLMs calculate probabilities

The reasons for hallucinating are relatively mundane. LLMs calculate probabilities for their responses – letter by letter, word by word. It's about what might follow next. "This means that LLMs always generate an answer, although no fact-checking by experts takes place," explains AI expert Christoph Haas from Bitsero. "They produce texts without knowing whether the facts or the underlying logic are correct," emphasizes Julien Siebert, Senior AI Expert at Fraunhofer IESE. "For this reason, they are sometimes also referred to as 'statistical parrots.'"

Another cause can be the data basis used to train the models. If it contains false information, the LLM tends to reproduce it. This is evident in biased answers, which appear despite all efforts by GenAI providers to prevent them. Most of the training data comes from the Western world, explains Haas, leading to "overfitting." The result is predominantly Western responses. Haas: "Information and perspectives from emerging and developing countries are underrepresented."

False or biased answers can be particularly problematic when users communicate not with ChatGPT from OpenAI or Gemini from Google but with a company. In other words, when the sender stands for the answers' accuracy, the expectations differ. Because a coherent customer experience also includes correct product information and answers to inquiries, companies must consider how to prevent or at least minimize hallucinations to avoid disappointments.

How can AI hallucinations be eliminated? A leap through quantum computing

One strategy to prevent hallucinations is to review regularly and thus further train standardized answers. Another is to connect the language model with a knowledge database or other tools to enable fact checks and incorporate these insights. This includes techniques such as "Retrieval Augmented Generation," where text generation is enriched with information from private or proprietary sources. Another is the Chain of Thoughts: Here, the LLM is asked to outline the intermediate steps of its reasoning.

Frontnow Advisor, an AI-driven virtual consultant for e-commerce customers, uses LLMs to understand language and provide grammatically and semantically correct answers. However, it solely relies on data stored in the shop provided by the customer. This can reduce error tolerance to nearly zero, and our customers' ethical and legal guidelines can be considered, says Marc Funk, CEO and co-founder of Frontnow.

But can language models be developed to a point where hallucinating is entirely ruled out one day? Given the complexity and dynamics of language and the constantly changing landscape of information, this is a huge challenge, stresses Christoph Haas. Similar to human thinking, where misinformation and misunderstandings also occur, a particular susceptibility to error in AI is inevitable. AI expert Siebert can at least imagine that the neural networks behind LLMs might one day be able to perform fact checks much faster than anticipated.

Further technical progress could also bring improvements. Computational power plays a crucial role in optimizing models; after all, advances in hardware have made developments like ChatGPT possible. A further breakthrough could now be achieved with the leap into the era of quantum computing. Haas: "This could achieve a computing capacity that comes closer to the human brain's performance and redefines the limits of AI research." This means that AI hallucinations could be significantly reduced once more.

‍