Today I want to write about one of our current hot-button topics: artificial intelligence, better known as AI. First, I want to pose the question: “is AI bad?” I think it probably is not inherently bad. But AI seems to be the most recent example of flawed implementation or misuse of technology by payors (and probably providers as well).
By now, many have no doubt read the ProPublica article about Cigna and are familiar with the 60,000 claims denied in a single month by Dr. Cheryl Dopke (that’s a little over six claims per second, assuming an eight-hour workday). We’ve read the quote from an unnamed Cigna medical director “[w]e literally click and submit … it takes all of 10 seconds to do 50 at a time.” Many are no doubt also familiar with the Cigna lawsuit resulting from these behaviors.
Similarly, many would be familiar with the suit against UnitedHealthcare (UHC) for using an AI algorithm that allegedly has a 90-percent error rate. Many are also familiar with UHC’s ED coding denials, based on use of an Optum product to code ED level of care.
But I have to ask: are these really different from Dr. Jay Iinuma’s testimony in an Aenta lawsuit, indicating that he denied most claims without ever opening a medical record? In the case at issue, Iinuma admitted he never read the plaintiff’s medical records and knew next to nothing about his disorder. Iinuma didn’t use AI; he used non-physician reviews to abstract the record and make recommendations. It’s not high-tech, but it is a physician shortcut.
The differences between modern algorithms and Iinuma’s corporate-driven denial practices are twofold:
- The AI algorithms are based on unknown training data sets with inherent built-in biases and undisclosed validation against human experts. These algorithms lack the nursing level decision-making upon which Iinuma claimed to rely.
- Volumes. The AI never sleeps. It doesn’t eat. It doesn’t collect overtime. In short, the AI is a full-time, automated edit of every claim. It can flag or deny claims faster than any medical director. Once it’s flagged or denied, it would require a significant degree of certainty or professional integrity to override the AI denial.
But would medical directors actually ignore the good practice of medicine or established protocols to make adverse decisions? We need only look at a LinkedIn profile of a former UHC medical director. Frank Baumann’s profile notes: I am a board-certified general surgeon who spent 10 years with the nation’s largest healthcare insurance company, denying level of care cases to hospitals. He goes on to ask:
- Why are we denying good care?
- Why did we tell everyone that we were using national, evidence-based guidelines – but then we didn’t?
Medical directors like Iinuma and Baumann make it clear that such flawed decision-making exists and doesn’t require technology. I suspect it will persist. After all, technology will now make it easier to render denials, and some medical directors lack either the knowledge or professional integrity to do the right thing.
We should look at the history of some other technologies in medicine. We can start with Index Medicus. This bibliographic index originated in 1879, and over the years morphed into MEDLINE. In the early days, using the index was laborious. It required identification of potentially useful articles, then finding or requesting the articles at the library. Researchers would scrutinize the articles for both relevance and scientific validity.
Digital conversion and computers allowed access to huge numbers of articles in multiple languages. Concomitantly, journal numbers exploded, presenting additional opportunities to publish.
References in journal articles increased from several to sometimes several hundred. What was missing, however, was an index of flawed articles. Only recently has a database of retracted articles been developed. It is incomplete. These articles are typically retracted for one or more of three reasons:
- Flawed method or analyses;
- Ethical lapses; and
- Fabricated or fraudulent data.
Our bibliographic systems are an excellent example of how technology has enabled errors to persist – or worse, propagate.
The next technology to consider is dictation and transcription. This was viewed initially as a time-saver for busy clinicians. But there have been unexpected results, regardless of whether the transcription is by a human or dictation software. This includes record entries such as the following:
- “Both breasts are equal and reactive to light and accommodation.”
- “Remnants of a soldier can be seen in the vagina.”
- “Patient has chest pain if she lies on her left side for over a year.”
- “The patient has left his white blood cells at another hospital.”
- “The patient refused an autopsy.”
These may be humorous, but they add little to the medical record or help clarify the patient’s condition(s). To account for these errors, some providers add a “disclaimer” such as “this note was created with (insert dictation service or software). Despite careful review, some errors may persist.” Providers rarely review or correct these notes. In retrospect, these transcriptions may be clearly recognized as errors, but few providers can actually recall what the correct entry should have been. In essence, the technology, and lack of immediate review, allow for a misuse that may lead to patient detriment and adverse financial consequences for institutions.
The large language machines (LLMs) upon which much AI is based take large volumes of electronic data and “train” the program.
Without careful curation for quality and ongoing updates, the LLMs obligately suffer from bias and are susceptible to errors. Despite these imitations, there’s good evidence that AI-generated diagnoses or documentation is comparable to its human counterparts – and in some cases, better.
You may be aware of the lawyers who were sanctioned for submitting an AI-generated brief to a court. In many cases, the brief itself was, by many accounts, reasonably good. The problem arose when the AI “hallucinated” several court citations. Opposing counsel complained because the citations could not be found. As a result, in one such case, a federal judge in Texas issued a requirement for lawyers in cases before him to certify that they did not use AI to draft their filings without a human checking their accuracy. While it would be comforting to believe that such a requirement of providers might result in improved documentation, the disappointing truth is that providers are unlikely to check the accuracy of AI-generated documentation. The dictation errors, as well as the behaviors of insurance company medical directors like Dopke, Baumann, and Iinuma, serve as painful examples.
So, what should organizations do, right now, to manage the use of AI? The first consideration is the internal use of AI. Institutions should:
- First, develop a responsible policy for using AI in the medical record. This will be very hard to police, since providers could simply copy an AI-generated document into the medical record. It would probably go undetected. But an annual pledge on the part of medical staff with clear expectations would be an excellent start.
- Second, providers should leverage AI to detect AI-generated documentation.
- Next, providers should use AI to detect repetitive or non-contributory medical record entries as well as to flag high-risk diagnoses. These analytical algorithms already exist in many clinical documentation integrity (CDI) and coding software programs.
Institutions should also leverage AI to respond to payers:
- Contracting is an ideal starting point. AI can review very large documents and flag problem areas for review by counsel. It can detect inconsistencies and contradictions that may later prove to be disputed. It can also help analyze contractual differences between payers.
- Denials management is another high-gain area. Allowing AI to categorize denials may be more accurate and consistent than human categorization. AI may be able to detect subtle changes in denial patterns or wording that portends nascent denial programs by payers.
The time is now to develop responsible uses for AI in-house.