Artificial intelligence, for all its cognitive power, can sometimes arrive at some really stupid, even dangerous, conclusions. When this happens, it’s up to humans to correct the mistakes. But how, when, and by whom should an AI decision be overruled?
Humans should almost always possess the ability to overrule AI decisions, says Nimrod Partush, vice president of data science at cybersecurity technology firm CYE. “AI systems can make errors or produce flawed conclusions, sometimes referred to as hallucinations,” he notes. “Allowing human oversight fosters trust,” he explains in an email interview.
Overruling AI only becomes completely unwarranted in certain extreme environments in which human performance is known to be less reliable — such as when controlling an airplane traveling at Mach 5. “In those rare edge cases, we may defer to AI in real-time and then thoroughly review decisions after the fact,” Partush says.
Heather Bassett, chief medical officer with Xsolis, an AI-driven healthcare technology company, advocates for human-in-the-loop systems, particularly when working with Generative AI. “While humans must retain the ability to overrule AI decisions, they should follow structured workflows that capture the rationale behind the override,” she says in an online interview. Ad hoc decisions risk undermining the consistency and efficiency AI is meant to provide. “With clear processes, organizations can leverage AI’s strengths while preserving human judgment for nuanced or high-stakes scenarios.”
Decision Detection
Detecting a bad AI decision requires a strong monitoring system to ensure that the model aligns with expected performance metrics. “This includes implementing performance evaluation pipelines to detect anomalies, such as model drift or degradation in key metrics, such as accuracy, precision, or recall,” Bassett says. “For example, a defined change in performance thresholds should trigger alerts and mitigation protocols.” Proactive monitoring can ensure that any deviations are identified and addressed before they are able to degrade output quality or impact end users. “This approach safeguards system reliability and maintains alignment with operational goals.”
Experts and AI designers are typically well-equipped to spot technical errors, but everyday users can help, too. “If many users express concern or confusion — even in cases where the AI is technically correct — it flags a disconnect between the system’s output and its presentation,” Partush says. “This feedback is critical for improving not just the model, but also how AI results are communicated.”
Decision Makers
It’s always appropriate for humans to overrule AI decisions, observes Melissa Ruzzi, director of artificial intelligence at SaaS security company AppOmni, via email. “The key is that the human should have enough knowledge of the topic to be able to know why the decision has to be overruled.”
Partush concurs. The end user is best positioned to make the final judgment call, he states. “In most circumstances, you don’t want to remove human authority — doing so can undermine trust in the system.” Better yet, Partush says, is combining user insights with feedback from experts and AI designers, which can be extremely valuable, particularly in high-stakes scenarios.
The decision to override an AI output depends on the type of output, the model’s performance metrics, and the risk associated with the decision. “For highly accurate models — say, over 98% — you might require supervisor approval before an override,” Bassett says. Additionally, in high-stakes areas like healthcare, where a wrong decision could result in harm or death, it’s essential to create an environment that allows users to raise concerns or override the AI without fear of repercussions, she advises. “Prioritizing safety fosters a culture of trust and accountability.”
Once a decision has been overruled, it’s important to document the incident, investigate it, and then feed the findings back to the AI during retraining, Partush says. “If the AI repeatedly demonstrates poor judgment, it may be necessary to suspend its use and initiate a deep redesign or reengineering process.”
Depending on a topic’s complexity, it may be necessary to run the answer through other AIs, so-called “AI judges,” Ruzzi says. When data is involved, there are also other approaches, such as a data check in the prompt. Ultimately, experts can be called upon to review the answer and then use techniques, such as prompt engineering or reinforcement learning, to adjust the model.
Building Trust
Building AI trust requires transparency and continuous feedback loops. “An AI that’s regularly challenged and improved upon in collaboration with humans will ultimately be more reliable, trustworthy, and effective,” Partush says. “Keeping humans in control — and informed — creates the best path forward for both innovation and safety.”