Skip to main content

Tackling Healthcare’s Data Challenges: The Case for Addressing Both Access and Quality

Authored by: Ofer Mendelevitch (Syntegra), Aaron Neiderhiser (Tuva Health), Coco Zuloaga (Tuva Health)

For more than a decade, there has been broad industry consensus that advanced uses of healthcare data will lead to innovations in care delivery, the identification of more effective treatments and improvements in the overall patient experience. Unfortunately, we’ve collectively fallen short of this goal. While some organizations have seen gains from analyzing healthcare data, the vast majority struggle to make any sense of it at all.

This is largely for two reasons: (1) lack of data access and (2) limited data quality.

The role of data

Data is at the center of decision-making across the healthcare ecosystem.

Health systems need to benchmark themselves against other systems. Payers need to conduct extensive scenario planning. Research centers should be able to easily share data with other research organizations to extract new insights. 

And now, in the health tech era, builders need access to (a lot of) data to build their products. And they need access to high-fidelity data that is actually representative of the diverse populations they are trying to help.

So, why is data being so underutilized?

Data access

Whether you’re a researcher at an academic medical center, a data scientist in health tech, or a software engineer working at a virtual-first care delivery organization, getting access to healthcare data is hard and, at times, impossible. And getting access to the right kind of patients is even harder. Take, for example, a data scientist who wants to build a predictive model. They either don’t have enough patient data on the specific outcome of interest they are trying to predict, or they don’t have enough representation of a specific population, leading to issues around algorithmic fairness.

Healthcare data is considered one of the most sensitive types of data on the planet. As a result, strict access controls limit how organizations make data available to their employees and how they share data with other organizations. This means great ideas die on the vine before data practitioners can even access data to begin the project.

Data quality

All this isn’t to say no one can access healthcare data. But, once you gain access, there’s a second hill to climb: data quality. Healthcare data notoriously suffers from quality issues, which fall into two main categories: (1) raw data quality issues and (2) lack of high-level concepts.

Raw data quality issues are exactly what you would expect — think completeness, validity and plausibility, among others. They are inherent in raw clinical (e.g. electronic health record) and claims data and are so common that academics have developed frameworks for studying them. For example, a common completeness issue occurs when a patient with end-stage renal disease hasn’t had a dialysis visit in several weeks. Either this patient has died and we don’t know it, or they are missing dialysis encounters in the data set.

High-level concepts are created by transforming raw (i.e. low-level) data into new data that is useful for answering specific questions. This is similar to feature engineering from a machine learning perspective. Examples include groupers (e.g. condition groups, procedure groups, drug classes, episodes), measures (e.g. readmissions) and other events (e.g. acute complications from underlying conditions). Working with healthcare data almost always requires developing these high-level concepts, but this requires significant subject matter expertise. To effectively work with healthcare data, a data practitioner must have a strong working knowledge of every clinical specialty, care delivery protocols and the administration of healthcare.

Addressing these challenges

Syntegra and Tuva Health are leveraging their expertise to address these challenges head on.

We’re excited to be partnering to make access to high-fidelity healthcare data that much easier. Without access to enough data and the right kind of data, new ideas will be stifled before they even begin to take shape. We need not only more, but better, data to accelerate the impact research and data science efforts can and should have to bring better care to patients.

In case you missed it, check out the datasets we just launched. Patient-level EHR and claims synthetic datasets can be downloaded for free immediately in FHIR and CCLF formats, as well as Tuva Health’s analytics-ready format.


Ofer Mendelevitch is the co-founder and CTO of Syntegra. Aaron Neiderhiser and Coco Zuloaga are the co-founders of Tuva Health.