We spoke with Jamie Blackport, CEO of Mirador Analytics, about the importance of guaranteeing patient privacy while ensuring maximum utility of healthcare data, ongoing challenges to data privacy and where synthetic data fits into the healthcare data landscape.
Q: What does the current landscape around data privacy in healthcare look like?
The healthcare system generates approximately a trillion gigabytes of data annually, and this amount is doubling every two years. Therefore, the amount of data created over the next three years is likely to be more than the data created over the past 30 years. The ecosystem around this data is complex and constantly trying to build the infrastructure needed to sustain this growth in data?
As the data universe grows in healthcare, so do the opportunities for insight. There have been big leaps in the technology and techniques available for better data analysis. Increased accessibility, increased data and increased granularity is great for innovation. However, this can lead to nervousness of lawmakers, regulators, privacy experts and individuals whose data is involved in these healthcare systems. And we’re seeing increased discussion and proposed regulation from stakeholders. We’ve also seen as a company an increase in awareness of privacy risk from industry members, with more communication with global privacy offices and client legal teams looking to gain a better understanding of what they can do proactively to minimize privacy risk.
Firms that have traditionally used the safe harbor method of de-identification are now looking toward the use of the Expert Determination method for de-identified data and alternative insights. Safe Harbor has limitations in utility and doesn’t have the same, in-depth consideration of risk that Expert Determination methods do. Companies typically have a cautious approach as the shift to Expert Determination can be a big change to current processes, and education and support is needed along the way. From our perspective, attitudes to re-identification are maturing, and people are more aware of both the risks and the implications. This is a good thing, as it makes people consider privacy risk mitigation in the development of products.
Q: Why is guaranteeing patient privacy important now more than ever? How is Mirador working with industry stakeholders to ensure privacy is protected while also allowing for maximum utility of data?
Data is the fuel, the glue and the product of the new healthcare ecosystem. Data connectivity is a driving force behind innovation and healthcare transformation. Unfortunately, as connectivity increases, so does risk. As more data is connected, clearer profiles of individuals’ lives start to be built. If this isn’t done with consideration of an individual’s right to privacy, it could erode trust with stakeholders and also (were things to go wrong) potentially have damaging implications to individuals.
Our support involves advice, education and validation in relation to disclosure risk. Many assume our interaction starts once data has been de-identified and that we focus on validation. However, we often start much earlier than that. For example, we support the development stages of new products, planning for dataset joins or analytics planning to ensure risk mitigation throughout. Our upfront input on disclosure risk helps rollout go more smoothly and better prepares teams for discussions on re-identification risk with vendors and potential customers. Thinking about privacy first enables the integration of automated solutions that reduce time to delivery and also allows teams more time to think about data utility, allowing us to focus on maximizing relevant granularity in an easier way.
Q: What are some of the ongoing challenges with current approaches to data privacy/de-identification in healthcare?
Regulation misalignment. Privacy laws and regulations can differ, region-by-region, state-by-state, industry-by-industry. This can make navigation tricky and alignment challenging to achieve. We see the same in definitions of de-identified data and anonymized data, which can make data alignment more difficult. With differing standards across regulations, organizations bringing data connections can find it difficult to build standardized, combined datasets.
Data cleanliness and standardization. Health data can be messy, records fragmented, and streams siloed by the institutions that control them. The lack of standardization and communication on standards leads to inconsistency in risk reducing methods. Inconsistency makes monitoring risk harder, and it also can result in a reduction of utility. Working together not only to connect data, but also to create consistent risk-mitigating approaches is something that may help industry evolve and grow together.
Attitude to risk monitoring. We are advocates of ongoing risk monitoring. The frequency of data feeds combined with the volume of data and granularity of data makes ongoing monitoring a must in future data risk mitigation strategy. We already mentioned that we are seeing a shift in focus to privacy, which is great, but we’re still working on changing attitudes from Expert Determination being a once-a-year tick-box exercise to an ongoing effort.
Q: How does Mirador’s partnership with Syntegra help to solve some of these challenges?
Despite different anonymization methods and human expert contributions, today, no technique can guarantee zero risks. However, synthetic data can help healthcare researchers create relatively risk-free data and overcome many privacy challenges seen in protected health information. Moreover, this technique can further advance AI / deep learning model development, which is already driving data governance and privacy management systems.
Our partnership with Syntegra aims to consider any residual risk in synthetic data generated from protected health information. We’re hoping that our analysis work will lead to the creation of best practice risk-reducing methods of creating synthetic datasets.
Q: What excites you the most about the promise of synthetic healthcare data?
Synthetic data has the potential to create a new set of standards in health data exchange in a way that focuses on reducing re-identification risk. The most exciting thing about this technology is its adoption. Synthetic data has the potential to improve existing AI algorithms, where protected health information isn’t needed, to make decisions influencing human lives in the future.
Q: How do you envision synthetic data integrating into the healthcare data landscape in the next 3–5 years?
Synthetic data has a high potential in the healthcare industry. As privacy regulations increase, a consistent approach to reducing privacy risk like synthetic data creation is likely to be more widely adopted where data use allows it. We see synthetic data being used alongside de-identified health information; both will likely have their places. Already we’re seeing partners utilize both data types in their organizations. How greater adoption will look will be dependent on technological advances and regulatory changes.