Taking a look at whether approaches that yield results in other fields could do so in healthcare, too.
Sometime around August of last year, I wrote an article about risk-based auditing: whether it was something real and tangible, or just another buzzword that was being tossed about and abused.
Defining “risk-based” seems to be a bit of a challenge for some folks, but for me, it is crystal clear: it is a method that, while maybe not invented by the Centers for Medicare & Medicaid Services (CMS), has been embraced by CMS and in fact pretty much every other payer on the planet. It is not a fad, but rather a very solid approach to creating checks and balances on our compliance strategies.
Back in 2011, CMS introduced the Fraud Prevention System (FPS), a new and sophisticated model for detecting claims that likely had been coded and/or billed in error. The FPS was a composite of predictive algorithms designed primarily by Verizon, and after becoming effective on July 1, 2011, today 100 percent of all Medicare fee-for-service (FFS) claims are passed through these algorithms prior to payment.
In its 2014 report to Congress, CMS boasted that it had prevented nearly a billion dollars from being paid out on improperly billed claims. And that is only a drop in the bucket when you look at how much was recouped as part of their chase-and-pay process. At least in the eyes of CMS (and most other payers), risk-based target selection was here to stay.
As discussed in my prior article, not everything that says “risk-based” is truly risk-based. For example, if one were to line up every procedure billed for over some period of time and then sorted that list by frequency, it would be logical in an internal review to start with those procedures billed out the most. But that’s not risk; that’s baselining. One could also compare the utilization for some procedure code against the utilization for some other group, like the Medicare utilization data set. But that’s not risk; that’s benchmarking. I will agree that each of those methods play a part in risk, but they do not alone define risk. Risk is more of an actuarial model that requires a non-linear approach. Let me provide an example.
Let’s say that I wanted to get a life insurance policy. I fill out the forms and some underwriter will take a look at the application to determine just how risky an investment I might be for them, and how that would impact the premium. I am a Caucasian male born in the United States in 1955. According to the Social Security data, my life expectancy is 74.2 years. So, from this point, I likely have around 12 years left. But this does not assess risk; it only benchmarks against some established baseline, just like comparing to Medicare.
To risk-adjust my data, the underwriter would look at some cohort data. For example, let’s say I responded on my application that I have been a smoker for the past 40 years. Well, as you can imagine, that is going to change my life expectancy. And let’s say that I have a really high BMI. That will impact my expected age at death as well. So, when all is said and done, using sophisticated statistical modeling (such as we find in predictive analytics, along with adjustments using cohort data), the insurance company can predict my age at death and subsequently determine whether I am an insurable risk – and if so, what the premiums would be over a given period of time.
The real question is, does it work? In the case of insurance companies, heck yes, or they would have found a different and better method. They are, after all, about money. It’s the same with CMS – another hearty “heck yes,” because if it wasn’t working so well, you wouldn’t see CMS spending tens of millions of dollars to renew a contract and you wouldn’t see most all other payers adopting these methodologies. So yes, risk adjustment works. But if it works for “those guys,” will it also work for us? That is the question that I want to answer here.
In predicting risk, I took two approaches. The first was to attempt to determine, for any given physician, which procedures and modifiers might be most at risk in the event of an audit. The second was to see how well I could predict whether a claim was billed in error without even opening the chart. I have definitive results for the former, but not yet for the latter, so I am going to discuss how effective predicting specific areas of audit can be with regard to mitigating financial damages. To do this, one has to take a look at the Comprehensive Error Rate Testing (CERT) study.
CERT is an annual study that is conducted by CMS. In general, they pull a statistically valid random sample of some number of claims from the Medicare claims warehouse and then audit the charts associated with those claims. The purpose of this study is more oriented toward auditing the payers than it is the providers; however, as the payers go, so do the providers. What I mean is, if some Medicare carrier is found to be paying lots of claims when they shouldn’t be, they won’t hesitate to go after those providers that report those specific codes or code groups that are most subject to billing error.
Overall, for 2017, the CERT study concluded that some 9.5percent of all Medicare Part B FFS claims are billed in error, and since most practices achieve some degree of singularity when it comes to coding methodology, it is reasonable to presume that the error rate for non-Medicare claims also falls around this 9.5percent figure. The 95 percent confidence interval is 8.9 percent to 10.1 percent. This means that if I were to pull some statistically valid random sample of claims from 100 randomly selected medical practices, I could expect that in 95 of those 100, the true error rate would be somewhere between 8.9 and 10.1 percent. For the practice, this would mean that if I were to conduct a random probe audit, I would likely find somewhere in the neighborhood of 9.5 percent of claims being in error. But I would also likely be missing over 80 percent of risk opportunities, meaning that in a random probe audit, I could not afford to pull enough charts for each provider to cover the range of unique procedures reported by those providers.
Risk-based auditing, when done correctly, should be able to increase that error rate based on the simple fact that the model will have identified those procedures that have a higher risk of audit and are statistically more likely to have billing errors, as defined by the algorithms and supported by CERT. Our system uses true predictive analytics to predict high-value targets, whether they be associated with the provider in general or specific codes and modifiers for that provider.
While this article is not about our system, suffice it to say that we use supervised learning techniques and a significant database of claims that have already been selected for an audit, thereby training our algorithms to classify risk by knowing what is unique about those particular claims. How do we measure accuracy using these models? In predictive analytics, the benchmark is whether your prediction is better than chance. At the most basic level, this would mean, if using a predictive model, the error rate would be higher than 9.5 percent (or the upper boundary of the confidence interval).
Try it yourself. See what your error rate is using a random probe audit and then try to risk adjust using simple methods, like baselining or benchmarking. If these methods are really effective, meaning more accurate than the probe audit (chance), we would expect to see a higher error rate, potentially. Remember, this is not always true, because we are predicting the likelihood of an audit and not the likelihood of an error, but it is my opinion that it does serve as a reasonable proxy.
Recently, we conducted an analysis of results for nearly 3,500 audits that were conducted by providers using our system to identify risk. While there was a pretty wide range of findings, the average error rate was nearly 18 percent, or just about twice that of chance. I was excited because it supports both what CMS is doing to identify potential billing issues and gives practices a new model to use to help level the playing field.
Remember, we are not predicting billing errors (at least not yet), but rather likelihood of an audit. But we should remember that many of the variables used to predict the likelihood of an audit are the same used to predict billing error. One might conclude that perhaps this means that risk-based auditing is simply missing nearly 80 percent of the billing errors, but to make this assumption, you would have to assume an error rate of 100 percent, which is simply improbable. Rather, we benchmark against the average error rate of 9.5 percent, and the result is a huge mitigation of financial risk because each practice has the opportunity to self-audit those high-risk events and correct them moving forward through education and remediation.
I guess the moral of the story is that while some assume that just because the government does it, it must not work, that’s just not the case here. Granted, the FPS was created by a private company (Verizon), but the fact that CMS saw its potential and adopted the model is a wonder in and of itself. While the jury may still be out as to where CMS is planning to go with these types of models, they have made it very clear that medical practices and healthcare providers need to up their game and get on board with methods that provide greater support for overall compliance.
The bottom line? Probe audits and utilization studies are out, and advanced statistics is in.
And that’s the world according to Frank.
Program Note:
Listen to Frank Cohen report this story during Monitor Mondays, March 19, 10-10:30 a.m. EST.