I’m thrilled to announce that my paper “Generating Synthetic Data with Locally Estimated Distributions for Disclosure Control” has been officially published in the Australian & New Zealand Journal of Statistics!
Kalay, A. F. (2025). “Generating Synthetic Data With Locally Estimated Distributions for Disclosure Control.” Australian & New Zealand Journal of Statistics, 1–24. https://doi.org/10.1111/anzs.70032
This paper has been on arXiv since October 2022, but underwent major revisions thanks to thorough feedback from anonymous referees. The final version is significantly improved in both clarity and scope.
This publication marks the first chapter of my PhD thesis to appear in a journal. The work introduces a framework for generating synthetic data that balances data utility with disclosure risk, enabling data custodians to statistically control privacy when releasing sensitive datasets.
Related Blog Posts
Over the years, I’ve written several posts documenting the journey of this research and the accompanying software:
-
A Fast Method to Create Synthetic Data with Python (October 2022): The original announcement of
synloc, my Python package for synthetic data generation. -
synloc Surpasses 2k Downloads (July 2023): A milestone update celebrating the package crossing 2,000 downloads on PyPI.
-
Presentation at AI, Law and Society Conference (February 2025)
-
Upgrading Synloc with Gemini CLI (June 2025): The story of updating
synlocto version 0.2, adding parallelization and reducing dependencies. This update was completed in just three hours with the help of AI coding tools.
Resources
-
Paper: DOI arXiv preprint -
Software: synloc on PyPI GitHub