Skip to main content

Yours, Mine & Ours: What’s the Deal with Healthcare Data & Privacy?

Alexander Kerman

When you think of privacy in healthcare, what do you think of? I picture confidential conversations with doctors as they type furiously, exam tables covered in paper, silly gowns that somehow cover nothing and maybe some frosted windows. 

But what happens when you leave your appointment? Where does everything your doctor typed go? Who gets to see that information, and what can they do with it? Presumably that data will be used to help in your care, but what if someone with the same issue shows up next year — can they look at what happened to you as a guide? 

It’s easy to think of privacy as obvious or intuitive, but ultimately healthcare data privacy lies at the heart of a web of tricky technical, legal and even moral questions. 

Generally speaking, privacy is all about freedom from external consequences – when you do things in private, nobody will know, so they cannot act on this information to judge, laugh at or tailor advertisements to you. A closely related idea is anonymity, where the crucial difference is that your identity is undisclosed, but your information is not. For example, you come across a footprint in a muddy path. You can learn a lot from the footprint — foot size, type of shoe, which way they’re headed — but you have no idea who (or what) left it, so that person is anonymous. If we want our healthcare data to be able to help other people, we want our data to be used without revealing our identity — but is that really feasible?

Sticking with the footprint metaphor, imagine you happen to be hiking with two friends: one is the world’s best hunter, who’s spent her whole life tracking things (hopefully animals!) through the woods; and the other is the biggest sneakerhead, who knows the tread of every pair of shoes ever made. With their help, you can start learning more about who (or what) left that footprint, adding more dimensions to your nascent footprint dataset. Maybe the hunter notices how deep the impression is, inferring that they must be pretty heavy, and sees that the next print is quite far away, so they’re probably very tall, while the sneakerhead recognizes the sole pattern as a rare size 17 model that was only sold in northwestern Oregon. With their help, you’re already  starting to build a better picture of who could have made the footprint. This gets to a crucial insight around healthcare data: the more dimensions in your dataset, the harder it is to anonymize the dataset.

Next, you decide to get other types of data, so you install a camera that catches a glimpse of a huge, shaggy, bipedal animal loping along the trail. Between the footprint dataset and the photo, you can be pretty certain that you’ve re-identified the elusive Sasquatch. This illustrates another key point: triangulating with other datasets increases dimensionality and makes identification (or re-identification) easier.

So does that mean data can never be truly “anonymized”? Can healthcare data ever be used safely without the risk of leaking our identities? Short answer: YES, but only once we add an extra step that makes a “pretend” footstep… AKA synthetic data. We’ll dive deeper into this in the next privacy post on de-identification standards such as HIPAA. 

Until then, check out our truly private sample datasets & reach out with any questions or inquiries.