Methods
This page provides an auto-generated summary of synloc’s API.
- class synloc.LocalCov(data: DataFrame, K: int = 30, normalize: bool = True, clipping: bool = True, n_jobs: int = -1, Args_NearestNeighbors: dict = {})
This is a method for clusterResampler class to create synthetic samples from the multivariate normal distribution with the estimated covariance matrix.
- Parameters:
data (pandas.DataFrame) – Original data set to be synthesized
K (int, optional) – The number of the nearest neighbors used to create synthetic samples, defaults to 30
normalize (bool, optional) – Normalize sample before defining clusters, defaults to True
clipping (bool, optional) – trim values greater (smaller) than the maximum (minimum) for each variable, defaults to True
n_jobs (int, optional) – The number of jobs to run in parallel, defaults to -1
Args_NearestNeighbors (dict, optional) – NearestNeighbors function arguments can be specified if needed. See scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html , defaults to {}
- static method(subsample: DataFrame)
Estimates covariance matrix and draw samples from the estimated multivariate normal distribution.
- Parameters:
subsample (pandas.DataFrame) – A subsample defined by the kNNResampler class.
- Returns:
Synthetic values.
- Return type:
numpy.darray
- class synloc.clusterCov(data: DataFrame, n_clusters=8, size_min: int = None, normalize: bool = True, clipping: bool = True)
clusterCov is a method for clusterResampler class to create synthetic values from the multivariate normal distribution with the covariance matrix estimated from the clusters.
- Parameters:
data (pandas.DataFrame) – Original data set to be synthesized
n_clusters (int, optional) – The number of clusters, defaults to 8
size_min (int, optional) – Required minimum cluster size, defaults to None
normalize (bool, optional) – Normalize sample before defining clusters, defaults to True
clipping (bool, optional) – trim values greater (smaller) than the maximum (minimum) for each variable, defaults to True
- method(cluster: DataFrame, size: int)
Creating synthetic values from the estimated multivariate normal distribution.
- Parameters:
cluster (pandas.DataFrame) – Cluster data
size (int) – Required number of synthetic observations. Size is equal to the number of observations in the cluster if not specified.
- Returns:
Synthetic values
- Return type:
pandas.DataFrame