Quality metrics¶
Version 1.0 adds utility diagnostics that can be used immediately after fitting. These metrics are descriptive checks, not formal privacy guarantees.
Per-variable statistics¶
Use compareStats to compare each original and synthetic column:
stats = resampler.compareStats()
print(stats[["original_mean", "synthetic_mean", "ks_statistic"]])
The returned DataFrame includes:
original and synthetic means
mean difference
original and synthetic standard deviations
standard-deviation difference
original and synthetic minimum and maximum
Kolmogorov-Smirnov statistic and p-value
Wasserstein distance
Overall report¶
Use qualityReport on a fitted resampler:
report = resampler.qualityReport()
print(report["overall"])
The report contains:
per_variable: the same table returned bycompareStatsoverall: mean and maximum Kolmogorov-Smirnov statistic, mean Wasserstein distance, and correlation-difference summaries
Function API¶
The same metrics are available as functions:
from synloc import compareStats, quality_report, kolmogorov_distances
stats = compareStats(original_data, synthetic_data)
ks = kolmogorov_distances(original_data, synthetic_data)
report = quality_report(original_data, synthetic_data)
Interpreting metrics¶
Lower values generally indicate closer agreement between original and synthetic data. The right threshold depends on the application, sample size, and privacy requirements. Use these diagnostics alongside domain checks and disclosure-risk assessment when synthetic data will be shared.