KNN Resampler

This notebook shows how KNN Resampler is used to create synthetic data. There are three methods available in synloc package. This notebook demonstrates two methods: LocalCov and LocalGaussianCopula. The first one draw synthetic values from multivariate normal distribution and the second one draws from the gaussian copula.

[ ]:
from synloc import LocalCov, sample_trivariate_xyz

Data

[2]:
df = sample_trivariate_xyz(1000)
df.head()
[2]:
x y z
0 0.001029 1.443241e-02 1.030596
1 0.000010 7.651150e-08 -0.402560
2 0.002199 8.689394e-01 9.819810
3 0.999379 1.780679e-01 1.473825
4 0.064769 9.160882e-01 9.113435

Using Multivariate Normal Distribution

We use LocalCov method to create synthetic data. The method uses k-nearest neighbors to create subsamples from nearest neighbors. Then, it estimates the covariance matrix of each subsample and draw synthetic values from the multivariate normal distribution.

[3]:
syn = LocalCov(df, K = 20) # K is the subsample size.
df_syn = syn.fit()
100%|██████████| 1000/1000 [00:01<00:00, 684.34it/s]

After the synthesis complete, you can use the class method comparePlots to visualize the synthetic data and the original data.

[4]:
syn.comparePlots(['x', 'y', 'z'])
../_images/Notebooks_nearest_neighbor_8_0.png