Here’s the uncomfortable truth: many hospitals and state Medicaid plans, under pressure from HR1 data analysis needs, are paying eye-watering markups to for-profit CPA and consulting firms for “data extracts” that are either federally managed, priced transparently, or even available for free in aggregate form.
Start with Medicare and Medicaid data routed through the Centers for Medicare & Medicaid Services’ (CMS’s) Research Data Assistance Center (ResDAC). CMS posts an official fee schedule for Limited Data Sets (LDS) and Research Identifiable Files (RIFs). For example, a full-year Medicare Inpatient LDS (the Standard Analytic File, or SAF) is listed at $3,000 per year (or $1,875 for a quarterly extract at 100-percent sample), while the Outpatient SAF is $7,000 (quarterly, $4,375).
Smaller files cost far less: the Master Beneficiary Summary File is $1,000 per year (or $625 quarterly).
On the RIF side – what many consultants cite when selling “custom cohorts” – CMS uses a beneficiary-count tiering. A year of Fee-For-Service inpatient claims costs $2,000 for studies up to 1 million beneficiaries, $3,000 for 1-5 million, $6,000 for 5-20 million, and $12,000 for 20 million+ beneficiaries.
Outpatient claims run $2,000, $5,000, $10,000, and $15,000, respectfully, across those same tiers. CMS also spells out that quarterly files are discounted after the first pull: “each subsequent quarter of data will be 50 percent of the fee for those files.”
The Medicaid T-MSIS Analytic Files (TAF) are likewise published with explicit prices. A year of TAF Inpatient is $3,000-$6,300, depending on cohort size; the Demographic & Eligibility (DE) file ranges from $3,500 to $14,000.
Even high-complexity components like RX and Other Services have list prices ($4,000-$20,000), not blank checks.
So, where does the “price gouging” come in? Not from the federal sources. It creeps in when consulting firms repackage public data, many times charging for data they already have and have billed to other clients.
You’ll see invoices that tack on “extract fees,” “data engineering,” and “cohort construction” for tasks that, in many engagements, amount to a) filling out a standardized request (often with minimal cohort logic), b) downloading data from the Central Distributor or CMS’s CCW/EPPE pathway, and c) running generic transformations that have already been automated internally. When those steps get resold line-by-line to every hospital without shared savings or transparent pass-through of the government list price, clients can end up paying several orders of magnitude more than the underlying cost with little incremental value.
To be clear, there are times when paying for expert help is worth it. True value-add looks like: a) robust study design; b) defensible cohort derivation with audited code; c) advanced linkage (e.g., T-MSIS + Medicare or facility cost report ties), where allowed; d) validated analytics with reproducible notebooks; and e) deliverables that translate findings into operational choices (service line planning, network strategy, reimbursement optimization). But hospitals should insist that:
- Government data fees are passed through at cost. There is no reason a consultant should markup data costs from government sources;
- Any recurring “extract” charge reflects real marginal work, because CMS already halves the price after the first quarterly pull for eligible files;
- The proposal clearly distinguishes data acquisition (at the posted rates), setup/ETL (amortized over clients when the same reusable pipelines are employed), and analysis/consulting (the part that should command premium fees); and
- When aggregate answers suffice, consultants steer you to public documentation instead of selling you a custom microdata engagement you don’t need.
In short, the public sector has already done the hard, expensive work of collecting, curating, and pricing these datasets. CMS publishes detailed fee schedules for LDS and RIF files, even down to per-file tiers and quarterly discounts. Demand pass-through pricing, transparency on reusable pipelines, and proposals that put the premium where it belongs: on insight, not on downloads.


















