Methods

These convenience classes implement the covariance-based Local Resampler methods used in the README examples.

class synloc.LocalCov(data, K=30, normalize=True, clipping=True, n_jobs=-1, Args_NearestNeighbors={})[source]

This is a method for clusterResampler class to create synthetic samples from the multivariate normal distribution with the estimated covariance matrix.

Parameters:
  • data (pandas.DataFrame) – Original data set to be synthesized

  • K (int, optional) – The number of the nearest neighbors used to create synthetic samples, defaults to 30

  • normalize (bool, optional) – Normalize sample before defining clusters, defaults to True

  • clipping (bool, optional) – trim values greater (smaller) than the maximum (minimum) for each variable, defaults to True

  • n_jobs (int, optional) – The number of jobs to run in parallel, defaults to -1

  • Args_NearestNeighbors (dict, optional) – NearestNeighbors function arguments can be specified if needed. See scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html , defaults to {}

static method(subsample)[source]

Estimates covariance matrix and draw samples from the estimated multivariate normal distribution.

Parameters:

subsample (pandas.DataFrame) – A subsample defined by the kNNResampler class.

Returns:

Synthetic values.

Return type:

numpy.darray

class synloc.clusterCov(data, n_clusters=8, size_min=None, normalize=True, clipping=True)[source]

clusterCov is a method for clusterResampler class to create synthetic values from the multivariate normal distribution with the covariance matrix estimated from the clusters.

Parameters:
  • data (pandas.DataFrame) – Original data set to be synthesized

  • n_clusters (int, optional) – The number of clusters, defaults to 8

  • size_min (int, optional) – Required minimum cluster size, defaults to None

  • normalize (bool, optional) – Normalize sample before defining clusters, defaults to True

  • clipping (bool, optional) – trim values greater (smaller) than the maximum (minimum) for each variable, defaults to True

method(cluster, size)[source]

Creating synthetic values from the estimated multivariate normal distribution.

Parameters:
  • cluster (pandas.DataFrame) – Cluster data

  • size (int) – Required number of synthetic observations. Size is equal to the number of observations in the cluster if not specified.

Returns:

Synthetic values

Return type:

pandas.DataFrame