KNN Resampler
This notebook shows how KNN Resampler is used to create synthetic data. There are three methods available in synloc
package. This notebook demonstrates two methods: LocalCov
and LocalGaussianCopula
. The first one draw synthetic values from multivariate normal distribution and the second one draws from the gaussian copula.
[1]:
from synloc import LocalCov, LocalGaussianCopula, sample_trivariate_xyz
Data
[2]:
df = sample_trivariate_xyz(1000)
df.head()
[2]:
x | y | z | |
---|---|---|---|
0 | 0.001029 | 1.443241e-02 | 1.030596 |
1 | 0.000010 | 7.651150e-08 | -0.402560 |
2 | 0.002199 | 8.689394e-01 | 9.819810 |
3 | 0.999379 | 1.780679e-01 | 1.473825 |
4 | 0.064769 | 9.160882e-01 | 9.113435 |
Using Multivariate Normal Distribution
We use LocalCov
method to create synthetic data. The method uses k-nearest neighbors to create subsamples from nearest neighbors. Then, it estimates the covariance matrix of each subsample and draw synthetic values from the multivariate normal distribution.
[3]:
syn = LocalCov(df, K = 20) # K is the subsample size.
df_syn = syn.fit()
100%|██████████| 1000/1000 [00:01<00:00, 684.34it/s]
After the synthesis complete, you can use the class method comparePlots
to visualize the synthetic data and the original data.
[4]:
syn.comparePlots(['x', 'y', 'z'])

Using Gaussian Copula
[5]:
syn_g = LocalGaussianCopula(df, K = 20)
df_syn = syn_g.fit()
100%|██████████| 1000/1000 [00:05<00:00, 173.02it/s]
[6]:
syn_g.comparePlots(['x', 'y', 'z'])
