To err is human, so GenAI errors may simply be a sign of an imperfect, almost human-like, technology. Still, whether generated by humans or AI, errors are always a good thing to avoid.
GenAI errors aren’t just frequent, but common, warns Matt Aslett, director of research, analytics, and data with technology research and advisory firm ISG. “Anyone using GenAI, either personally or professionally, should be aware that GenAI models are designed to produce a realistic replication of the content on which they have been trained, rather than a factual representation,” he observes in an email interview.
Large language models (LLMs), for example, are trained to generate written content that’s grammatically valid, based on the statistical predictability of the next word in a sentence, Aslett explains. “LLMs have no semantic understanding of the words generated,” he notes. “As such, there’s no guarantee that the content generated will be factually accurate.”
GenAI and large language models have an uncanny ability to sound very accurate, confident, and knowledgeable, says Mike Miller, a senior principal product leader at Amazon Web Services. “They can sound eloquent and converse in language that feels authentic,” he observes in an online interview. “Catching errors from GenAI can be difficult, because if you ask GenAI how it came up with an answer, it might give you a reasonable-sounding explanation that could still be made up or false.”
Embrace Verification
GenAI models should never be used in isolation, Aslett advises. “Users should always verify the factual accuracy of both the content generated by GenAI and its cited sources, which could also be a fabrication.”
Individuals must ultimately rely on their own knowledge to assess the accuracy of content produced by GenAI and identify errors, Aslett says. Enterprises, meanwhile, can apply validation models to assess a GenAI model’s output and then compare the content against approved data and information sources to identify likely errors.
GenAI mistakes can be addressed in several ways, says Satish Shenoy, global vice president, technology alliances and GenAI at business process automation firm SS&C Blue Prism. “These techniques vary, including logging and auditing to predictive debugging to using LLMs as a judge, or even placing a human-in-the-loop,” he states in an email interview. “Governance and guardrail frameworks are also being used in conjunction with the LLMs to catch generative AI errors.”
Danger Ahead
Given GenAI’s inherent lack of accuracy, decisions should never be based solely on its output, Aslett says. “There’s a risk that could result in an organization making costly business decisions based on erroneous information.” Additionally, enterprises disseminating insights generated by GenAI run the risk of regulatory fines and reputational damage if the information proves to be inaccurate.
There are many examples of GenAI errors, Aslett observes, For example, Air Canada’s chatbot providing a customer with inaccurate information. He also notes that lawyers have been fined for submitting court filings incorporating inaccurate information, such as citing legal cases that never existed.
Improving Accuracy
The best approach to improving GenAI accuracy is by adopting a variety of processes, Aslett advises. “This could include training a model on its own data and information, although that’s potentially costly in terms of training and maintaining the model,” he says. Another approach is prompt engineering, in which a user instructs the model to use only specific data or information when generating its response. “This is a short-term solution that only applies to the individual prompt as the additional information is not retained by the model,” he cautions.
Miller advises using automated reasoning, a scientific discipline that leverages mathematics and logic to prove theorems or facts. “We use automated reasoning to generate policies or procedures and guidelines,” he says. “Automated reasoning provides higher confidence in correctness than traditional testing methods, although it still depends on underlying assumptions about component behaviors and environmental models.”
Once a GenAI error has been detected, begin tracing the problem, Shenoy suggests. Start by analyzing the error and the potential factors that led to its occurrence. “Fixing the model could involve tuning or training it,” he notes. In some instances, the model may need to be tweaked. “It’s also important to bolster any governance and control frameworks that are in place to minimize errors from slipping through the cracks.” Additionally, to avoid future errors, it may be necessary to test the data and the process involved. “If humans are involved in any part of the process, they should also be trained.”
Correctness Counts
Checking GenAI for correctness is essential since it allows enterprises and customers in various industries to use AI in applications where safety, financial, or health information is provided to customers, Shenoy says.