Artificial Intelligence (AI) has been a game-changer in various fields, but recent studies have shown that AI ‘hallucinates’ or provides incorrect answers at least 3% of the time. This phenomenon is not limited to a single AI model but is observed across various popular language models.
A startup called Vectara, founded by former Google employees, took on the task of investigating this issue. They conducted experiments, tasking different popular language models with summarizing text documents. The results they obtained were nothing short of startling.
OpenAI’s Chat-GPT presented the least hallucinations, with incorrect answers between 3% and 3.5% of the time. Other models like Llama from Meta, Cohere, Claude 2 from Anthropic, Mistral 7B, and Palm from Google showed error rates ranging from 5% to as high as 27.2%.
These hallucinations are not confined to a specific task but can vary depending on the nature of the request. For instance, when requesting a response without specifying a source, the likelihood of hallucinations increases. In the case of Google’s tool, errors are more common due to the longer and contextualized nature of the answers it generates.
Not only chatbots like Google’s and Microsoft’s Bing have been found to spew nonsensical or incorrect information, but they can also fabricate details altogether. For instance, Google’s chatbot provided inaccurate information regarding the James Webb telescope, while Bing’s chatbot offered bogus details on various topics.
In March, ChatGPT cited several fake court cases when writing a legal brief. Even when given straightforward tasks like summarizing news articles, these chatbots persistently invent information.
The implications of these hallucinations can be severe, as demonstrated in a courtroom incident where a lawyer employed ChatGPT to research similar cases for their argument. The AI model generated false cases, which were then used in official documents, potentially leading to legal consequences.
While companies strive to minimize these flaws, some experts question whether they are inherent to the technology itself. They suggest that users of AI language models should remain aware of the risk of hallucinations and employ these models as critical tools, acting as “co-pilots” rather than relying solely on them as replacements for human judgment.
The study’s findings also prompt broader questions about the future of AI. As AI technology continues to advance, it becomes increasingly crucial to consider the ethical implications of its usage and ensure that AI models are utilized responsibly and ethically.