January 20, 2024

Naga Vydyanathan

Why is my AI lying to me?

From BARD to ChatGPT, hallucination in AI happens when AI model generates false information. Explore with us some of the reasons why that happens and how to workaround these hallucinations.

Table of contents


Have you ever been tricked into believing a false snippet of information churned out by a generative AI tool in a contextually plausible manner? Starting with Google Bard’s confident but erroneous response to the discoveries by the James Webb Space Telescope in its very first demo, there have been several cases of generative AI tools misinterpreting, misrepresenting, misquoting and sometimes even fabricating data.

This author’s colleague recounts humorously, “I asked GPT to give me the summaries of each chapter for a book I was supposed to have read. It gave me summaries for 34 chapters. The book had only 12.” He lamented, “Eventually I had to read the book.”

The growing concern about these AI-induced hallucinations has made companies nervous about adopting Large Language AI Models in production. A Telus International survey reports that 61% of respondents voiced concerns about AI hallucinations and the increasing spread of inaccurate information.

While hallucinations in generative AI is indeed a reality and can be a safety concern in some applications, the panic surrounding it is primarily due to lack of clarity. What are hallucinations? – how, when and why do they occur? – what is their impact? – how can you detect, measure, prevent or mitigate them? – these are just some of the many questions that are still not adequately answered. Let us try to answer some of these!

What are AI-induced Hallucinations? How do they impact us?

Hallucination in the general context refers to unreal perceptions that feel real. In the world of AI, hallucinations occur when the generated content, be it audio, textual, video or imagery,  is factually incorrect, unrelated to the given context, nonsensical (non-existent in the real world) or unfaithful to the training data or provided source content. For example, when chat GPT was asked if the number 9677 was prime, it answered – “No, 9677 is not a prime number. It is divisible by 67 and 139 in addition to 1 and itself.” Anyone with a calculator can assess that this is blatantly wrong.

An example of a hallucination in Chat GPT

AI art generator, , has been shown to create seemingly realistic ‘party’ images that on a closer look reveal extra teeth, fingers and in some cases detached or deformed limbs.

While some AI hallucinations are easy to detect, many others are not apparent unless pointed out. Let us look at the two broad categories of AI hallucinations as summarized by a 2022 survey on Hallucinations in Natural Language Generation.

Intrinsic Hallucinations

These refer to hallucinations that arise when the generated content, though based on the training data, ironically contradicts or is unfaithful to it. For example, if an AI chatbot says that Neil Armstrong landed on Mars when the source/training content reports that Neil Armstrong is the first person to land on the moon and that no human has yet landed on Mars, this is an example of an intrinsic hallucination. In this case, information in the training data is misrepresented.

Extrinsic Hallucinations

Hallucinations that occur when the generated content fabricates additional information outside the scope of the source or training data are termed extrinsic hallucinations. Following up on our conversation on space, if the chatbot adds that India is planning a manned mission to mars in 2024 when no such information is provided in the training source, it is an example of an extrinsic hallucination.

Intrinsic hallucinations are direct contradictions to the training data and hence are more easily verifiable than extrinsic hallucinations which are very hard to detect. For instance, mixing up of names of famous personalities (eg. Roger Nadal), which is an example of an intrinsic hallucination in a generative task like dialogue generation, is easier to detect as compared to Google Bard’s infamous response about the James Webb Space Telescope being the first to take a picture of an exoplanet, an example of a fabricated, extrinsic hallucination. The latter sounds quite plausible and garnered belief, until it was contradicted by domain experts.

Bots may also give different and sometimes even inconsistent answers to the same question worded differently, as seen in the conversation with chat GPT on prime numbers.

Inconsistency in responses if the prompt is worded differently

Implications of AI Hallucinations

The implications of AI hallucinations can vary depending on the context and severity of the hallucinations. Some potential and serious implications are misinformed decision making, loss of trust and reliability, legal liabilities, social and ethical concerns due to misrepresentation and propagation of biased, inappropriate content, and security risks.

On the brighter side, in some cases, AI hallucinations might be embraced for their artistic or creative potential, leading to new forms of digital art and expression. This author’s friend, for example, was forced to read an entire book by a hallucinating GPT.

So, what triggers AI Hallucinations?

To understand what causes AI induced hallucinations, we need to look at Large Language Models (LLMs) which is the fundamental building block of any generative AI system. In simple words, an LLM is a deep learning model that trains on massively large data sets to infer and learn patterns that enable it to understand, summarize, predict and generate new content. Though primarily LLMs work with textual data, multi-modal LLMs can be applied to images, audio and video inputs.

LLMs use statistics and machine learning to generate responses to the given prompt, and this is a primary reason why they have the tendency to hallucinate. Imagine a language model that has been trained on text data, which includes information about capital cities. During its training, the model encounters the word “Berlin” very frequently as the capital of Germany. “Paris” is also encountered, but less often, as the capital of France.

When asked, “What is the capital of France?”, the model’s response is influenced by the statistical patterns it has absorbed. It recognizes “Berlin” as a highly prevalent term associated with capital cities due to its frequency in the training data. In contrast, “Paris” is acknowledged as the capital of France, but with a lesser statistical weight. As a result, the model may hallucinate a response and generate: “The capital of France is Berlin.”, overlooking the more contextually relevant and accurate information that “Paris” is the capital of France. Thus, the reliance on statistical patterns in LLMs can lead to incorrect responses due to statistical biases that can influence its output. Therefore, the choice of training data plays a pivotal role in shaping the hallucination tendencies of LLMs. Training LLMs with outdated, low quality, erroneous, biased (possibly due to duplicates), inconsistent and incorrectly labelled (source-reference divergence in heuristically collected data) data can greatly increase the chances of intrinsic hallucinations.

Apart from the quality of the data used to train these models, the training and inference techniques used, and inconsistent, unclear and contradictory prompts are two other main factors resulting in AI hallucinations. Encoding and decoding form two important steps in large language learning. The encoder comprehends and encodes the human-readable input text into meaningful machine-readable vector representations. When encoders learn wrong correlations, the encoded text can diverge significantly from the original, obviously leading to hallucinations.

Decoders, on the other hand, can attend to the wrong part of the encoded input source, leading to erroneous generation with facts mixed up between two similar entities. For example, a mixup of names such as Roger Nadal, can be a result of erroneous decoding. Further, decoding strategies that aim to improve generation diversity and be more creative, such as top-k sampling has been shown to have a higher tendency to hallucinate.  

A third factor that makes LLMs hallucinate is exposure-bias, which refers to the discrepancy in decoding between training and inference. This occurs when during training, the language model predicts the next word conditioned on history words sampled from the ground truth data distribution. During generation, the model generates words conditioned on history sequences generated by the model itself. However, due to excessive exposure to ground truth data during training, the language model is biased to only perform well only on the ground truth history distribution, accumulating errors especially for long sequences.

LLMs rigorously train on a vast corpus of data, gaining parametric knowledge (models memorize knowledge in their parameters). When an LLM favours generating output based on parametric knowledge over source input, it results in extrinsic hallucinations.

The visual below summarizes the main factors triggering hallucinations in LLMs.

Factors that trigger AI Hallucinations in LLMs

Do all LLMs hallucinate to the same extent?

Open AI’s GPT-4 (used in chatGPT), Google’s PaLM-2 (used in Bard), Meta’s LLaMa, Cohere and Claude-2 are some notable large language models – but do they hallucinate to the same extent? Are there certain use-cases when the possibility of hallucinations is much higher than others? Finding answers to these questions will help one develop a robust strategy to prevent and mitigate hallucinations rather than adopt a panic-induced-let-us-boycott-generative-AI reaction.

A very recent study in August this year by researchers from Arthur AI, compares the hallucination tendencies of some popular LLMs – GPT-3.5, GPT-4, Claude-2, LLaMa-2 and Command from Cohere. These LLMs were subjected to questions spanning three categories – combinatorial mathematics, U.S Presidents and Moroccan Political Leaders and three answers to each question was recorded (to evaluate if LLMs were consistent in their responses). The questions were framed in a way that demanded the models to consider multiple steps of reasoning to arrive at a response. The responses were categorized into correct answers, hallucinations and hedges (where the LLM avoided answering the question).

Results of the “Hallucination Experiment” by Arthur AI

GPT-4 is seen to give the best answers and least hallucinations in the category of combinatorial mathematics and performs best overall. Cohere hallucinates the most in general, followed by GPT 3.5 and LLaMa-2. Cohere seems to have the maximum tendency to provide wrong answers with the strongest conviction (does not avoid any query and has the maximum tendency to hallucinate).

The above experiment shows that hallucination tendencies are dependent on the large language model (its parameters, training methodology and training data) as well as on the domain of query or use-case. Thus, enterprises can choose the right LLM for their requirement – GPT-4 can be a good choice when mathematical reasoning and analytical abilities are needed, while Cohere can be a good choice for creativity. A September study by the Prompt Engineering Institute shows that LLaMa-2 demonstrates GPT-4’s factual accuracy for summarization tasks and thus can be a good candidate for a cost-effective deployment for high integrity NLP.

Given that AI hallucinations are here to stay, can we detect and mitigate them?

The good news is that, just as we have had algorithms and techniques to prevent, detect and resolve deadlocks in the realm of parallel computing, AI induced hallucinations can also be addressed reasonably effectively. The fact that GPT-4 is 40% less likely to hallucinate as compared to its predecessor, is enough proof that this is possible. We frequently see responses like the one below now as compared to a hallucinated fictitious response.

A hedging answer by chat GPT as opposed to a hallucinated response

As discussed, LLMs hallucinate due to either issues with the training data, training methodology or the input prompt. Mitigation strategies should therefore be aligned with these.

Training data – build a ‘faithful’ dataset

In the context of artificial intelligence, a faithful dataset is one that provides a truthful and reliable reflection of the underlying data distribution or the domain it represents. Employing effective annotators, filtering out semantic noise and hallucinated source-reference pairs, augmenting the inputs with external information such as entity information, extracted relation triplets and retrieved external knowledge and similar training samples are some ways by which the quality of the training data and its faithfulness can be enhanced.

Modeling and inference methodology – architecture, training and inferencing enhancements

Dual encoders for better semantic interpretation over the input, enhanced attention mechanisms with source-conditioned bias, guided temperature-sampling in decoders to reduce randomness, reinforcement learning, multi-task learning and controlled generation techniques are some methods to mitigate hallucinations during training and inference. Guardrails can be used to mark boundaries within which the model operates. For example, topical guardrails that prevent the model from veering off into unrelated and undesirable domains, safety guardrails that ensure references to only credible sources and secure guardrails that allow connections to only “safe” third-party apps, can be used to restrict the model to work within known and safe boundaries.

Prompt – provide consistent, clear and contextual prompts

Not all of the work is to be done by the AI system. In fact, it is important to remember that the AI system doesn’t know that it is hallucinating. Thus, the onus to recognize and mitigate hallucinations also lies with the user.

Few-shot prompts where the user specifies several examples within the prompt, provides a good context to the model to generate appropriate and accurate responses. Context injection, where the user specifies supplemental information such as references to additional text, code etc, helps LLMs latch on to the right context, preventing hallucinations.

In enterprises, grounding and prompt augmentation techniques help in mitigating hallucinations. Grounding involves providing LLMs access to use-case specific information which may extend beyond their inherent training data. By incorporating explicitly cited data, a grounded model (eg: a retrieval augmented generation or RAG model) can produce outputs that are both more precise and contextually relevant.

In conclusion, are AI Hallucinations really that bad?

Deadlocks have been a longstanding challenge in the world of parallel computing, but has that deterred us from accelerating our programs using multicores, GPUs and giant clusters? And unlike deadlocks, hallucinations, within limits, can even be desirable in certain use-cases in the realm of art and creativity.

Much like the high-performance computing community has devised strategies to tackle deadlocks, it’s only a matter of time before the AI community masters the art of managing hallucinations. AI hallucinations is a fast growing area of research with new techniques for preventing, identifying and mitigating hallucinations being proposed even as we read this article.

Further, not all LLMs hallucinate the same way and not all use-cases are susceptible to high degrees of hallucinations. By deeply understanding your business scenario and requirements, selecting the right LLMs trained on faithful datasets, establishing safety boundaries through guardrails, and employing context-rich and clear prompts, you can harness the benefits of AI while effectively managing any unwelcome hallucinatory outputs.

Naga Vydyanathan
Naga Vydyanathan