Setting the standard for synthetic healthcare data quality and privacy
Demonstrating the privacy and statistical fidelity of synthetic data is critical to realizing its widespread adoption, which is why we’ve developed industry-leading metrics for ensuring the validity of our data.
Across healthcare today, there is often a tradeoff when it comes to maintaining a certain level of data quality while also protecting privacy. Our metrics demonstrate that both privacy and statistical accuracy can be completely preserved through Syntegra’s unique approach to synthetic data generation.
Created in easily understood terms of disclosure risk, our privacy metrics ensure that no patient can be re-identified via the synthetic data. While synthetic data should be fully private, we believe in putting it to the test to guarantee complete protection of patient data.
Third-party certification of our privacy metrics is provided by Mirador Analytics.
simulating a membership inference attack and measuring success rate in terms of disclosure risk
simulating an attribute inference attack and measuring success rate in terms of disclosure risk
proving no data was copied
We understand the importance of using high-quality data. That’s why we use the following approaches to evaluate the fidelity of our synthetic data against the original data. With these metrics, you can feel confident in the use of synthetic data.
- Univariate Distributions – Do distributions of key variables (age, drugs, etc.) match?
- Pairwise Correlations – Are pairwise correlations between features in the data maintained?
- Temporal Metrics – Do the Kaplan Meier Curves and p-values match?
- Discriminator AUC – Can a machine learning model discriminate between real and synthetic data?
- Predictive Models – Is predictive model performance maintained if trained on synthetic data?