I recently had the privilege of presenting at the Artificial Intelligence, Law and Society Conference at Macquarie University on February 14, 2025. My presentation, “Generating Synthetic Data with Locally Estimated Distributions for Disclosure Control,” was part of my PhD thesis.
In this work, I conceptualized and developed a framework that parameterizes both data utility and disclosure risk, enabling data custodians of sensitive datasets to statistically control their privacy risks when creating synthetic data sets. A key innovation of this approach is its ability to handle outlier observations in real data, allowing users to adjust their risk levels as needed. Given that the approach is semi-parametric, it is not only computationally fast but also capable of accurately replicating even small datasets.
I’m excited to share that all of this work is now available as a Python package: synloc.
I am extremely grateful for the invitation to present and for the insightful discussions that followed at the conference.