Usage¶
Nearest-neighbor resampling¶
LocalCov estimates a local covariance matrix from each observation’s
nearest-neighbor neighborhood and draws one synthetic value from the estimated
multivariate normal distribution.
import synloc as s
data = s.sample_trivariate_xyz(1000)
resampler = s.LocalCov(data=data, K=30, n_jobs=1)
synthetic = resampler.fit()
synthetic is a pandas.DataFrame with the same columns as data.
If a variable is constant inside a local neighborhood, that variable is copied
exactly for the synthetic value drawn from that neighborhood.
Use sample_size to request a different synthetic sample size:
synthetic = resampler.fit(sample_size=500)
Cluster resampling¶
clusterCov clusters the data with KMeans, estimates a covariance matrix
inside each cluster, and draws synthetic observations cluster by cluster.
If a variable is constant inside a cluster, that variable is copied exactly for
synthetic rows generated from that cluster.
import synloc as s
data = s.sample_circulars_xy(1000)
resampler = s.clusterCov(data=data, n_clusters=20, size_min=8)
synthetic = resampler.fit()
Visualization¶
After fitting, compare one, two, or three variables visually:
resampler.comparePlots(["x", "y"])
resampler.comparePlots(["x", "y", "z"])
The plotting helper is intended for quick diagnostics. For publication figures, use the returned synthetic data with your preferred plotting workflow.