Replication materials¶
The replication folder contains the notebook, scripts, and intermediate
datasets used for the paper:
Kalay, A. F. (2026). Generating Synthetic Data With Locally Estimated Distributions for Disclosure Control. Australian & New Zealand Journal of Statistics. https://doi.org/10.1111/anzs.70032
The main file is replication/Replication Notebook.ipynb.
Reproducibility notes¶
Some comparison packages, especially SDV, changed their APIs after the paper analysis was completed. The replication notebook prints exact package versions; use those versions when reproducing paper outputs.
The folder also includes a modified copy of anonymeter. It is used for the
singling-out risk calculation and is included to avoid a dependency conflict
with the historical replication environment.
Intermediate CSV files are retained because parts of the comparison workflow involve nested randomness. Keeping these files makes it possible to check paper outputs without rerunning every stochastic comparison model.