HCC scores also don’t predict costs (or payments).
EDITOR’S NOTE: This is the second and final article in a two-part series about Hierarchical Condition Category (HCC) risk scores. Part I can be found here.
In my last article, I wrote about how one could use HCC risk adjustment scores to identify providers that may possibly be coding inappropriately – that is, over-coding or under-coding. The article also focused on how benchmarking against a calculation of averages would help to prioritize clinical documentation improvement (CDI) efforts.
But there’s a bigger story here, as far as I am concerned, and that is whether there is really any value to using those risk scores to establish and adjust capitation rates for Medicare Advantage Organizations (MA).
As discussed in the last article, at its most general definition, HCC risk scores are used to assess the relative health of a given patient. They are calculated by considering the diseases (ICD codes), services, and demographics of a given patient. The higher the score, the sicker the patient, and the likelihood that their care will be more expensive. And this translates into a higher capitation rate for the MAO. So, it’s not much of a stretch to think that the MAO would want the provider to include as many disease diagnoses as possible.
This model led to the creation of the niche industry of CDI. CDI subject-matter experts are often also experts in inpatient coding and auditing, and their job is to ensure that the documentation in the chart is as complete as possible.
Under Section 20 of Chapter 7 in the Medicare Managed Care Manual (MMCM), we read the following: “Risk adjustment allows CMS (the Centers for Medicare & Medicaid Services) to pay plans for the risk of the beneficiaries they enroll, instead of an average amount for Medicare beneficiaries. By risk-adjusting plan payments, CMS is able to make appropriate and accurate payments for enrollees with differences in expected costs. Risk adjustment is used to adjust bidding and payment based on the health status and demographic characteristics of an enrollee. Risk scores measure individual beneficiaries’ relative risk, and risk scores are used to adjust payments for each beneficiary’s expected expenditures. By risk-adjusting plan bids, CMS is able to use standardized bids as base payments to plans.”
What was surprising to me was how much of the raw data is used to calculate the risk scores from the Medicare Fee-for-service (FFS) data. Granted, the methodological statement includes a collection of “MA-reported diagnosis data,” but I am not quite sure what that means.
One of the reasons that CMS does this is to discourage MAOs from avoiding the enrollment of sicker patients. The idea is that sicker patients pay more, and if the MAO can provide care in an efficient manner, then sicker patients at least theoretically should be as profitable as healthier patients who generate less capitation revenue. The point to note is that the risk scores are relational, and compared against a median score of 1.0. In essence, this represents the “average” cost for the “average” Medicare beneficiary.
This brings me back to my surprise that CMS uses Medicare FFS data to calculate the scores, since this does not apply to Medicare FFS beneficiaries. In an article by Stephanie L. Shimada et al., titled “Market and Beneficiary Characteristics Associated with Enrollment in Medicare Managed Care Plans and Fee-for-Service,” the authors state that “traditional Medicare and Medicare Advantage enrollees have historically had different characteristics, with Medicare Advantage enrollees somewhat healthier.” One can then logically assume that Medicare FFS enrollees are somewhat sicker. And if that is the case, then CMS is using data from sicker beneficiaries to calculate capitated payments for healthier beneficiaries. And off model, to be sure.
You would think that this might impact access to care, and rumors abound that MAOs are great if you are healthy, but not so great if you are sick. But in one study, “a similar share of beneficiaries in traditional Medicare and Medicare Advantage plans report problems in obtaining needed healthcare.”
To me, then, the burning question focuses on whether the HCC risk scores are, in plain language, worth the cost of the effort to calculate them. And if not, then what alternative is there? One reason this is important is because some 45 percent of Medicare-eligible people are enrolled in MAOs, so there is a lot on the line here, to the tune of hundreds of billions of dollars.
If it is true that sicker patients cost more (resulting in higher capitation rates), then one would expect there to be a strong correlation between the average risk score for a provider’s patient population and the average payment per unique beneficiary for that same period. I tested this by using the Physician and Other Practitioners Public Use File, which contains over a million lines, with each line representing a unique National Provider Identifier (NPI) code. After filtering the file for entities (rather than providers), specialties that were not relevant to the study, outliers, and other anomalies, I ended up with just over 780,000 lines, meaning that I had data on over 780,000 physicians and other providers (i.e. NPs, nurse practitioners, and PAs, physician assistants). In this file is included a field that reports the average HCC score for each provider. The way this is calculated is to take the HCC score for each unique beneficiary, total them up, and divide that by the number of unique beneficiaries. Another field reports the total payments from Medicare to each provider during the data period. Dividing that by the same number of unique beneficiaries results in the average paid amount per unique beneficiary. So now, I have two critical metrics: the average risk score and the average paid amount.
The assumption is that the average paid amount will track in a positive direction with the average HCC risk score, and that makes sense; higher risk score, higher payment. At least that’s the idea with MAOs. To make sure that I didn’t overwhelm the model with too much data (overfitting the model), in addition to correlating all of the data, I also took a random sample of 100 and 1,000, and plotted those as well. The average correlation coefficient was 0.083, meaning that there was almost no correlation at all. In fact, the coefficient of determination, which is used to measure how well a statistical model predicts an outcome, was 0.00689. The means that less than 1 percent of the outcome can be explained by the model. Another way to explain this is to say that, in this case, less than 1 percent of the data fit the regression model of observed data points.
In my opinion, whether HCC scores predict costs (or payments) is not even open for debate; they don’t. And I did this again with 12 specialties chosen at random to see if maybe averaging all specialties was cancelling the data. My findings showed that some of those specialties had negative correlations, meaning that as the risk score rose, the payments declined, while others had slightly higher correlations. The highest among the dozen was 0.242, giving a coefficient of determination of 0.058, meaning that 5.8 percent of the payments can be explained by the HCC score (or fit the regression model).
My conclusion is that, at least when using the Medicare FFS data, there is no relationship between HCC scores and payments. And if payments are truly tied to costs, then the assumption is that there is no relationship between cost and HCC score. If this is the case, then what value is there in using HCC risk scores to negotiate capitation rates?
I am the first to admit that my method does not produce the most accurate results; maybe averaging the data does something to remove the correlation, but I am unconvinced that is the case. And if HCC risk scores are not an accurate measure of costs (and therefore capitation rates), then what are?
In the book “The Bell Jar” by Sylvia Plath, the character Esther Greenwood says, “’I don’t really know,’ I heard myself say. I felt a deep shock hearing myself say that, because the minute I said it, I knew it was true.”
My response to that? Like Esther Greenwood, “I don’t know.”
And that’s the world according to Frank.