Input requirements

synloc expects a numeric pandas.DataFrame.

Before calling a resampler, prepare the data as follows:

  • Encode categorical variables as numeric columns, for example with pandas.get_dummies.

  • Use one row per observation and one column per variable.

  • Avoid duplicate column names.

  • Avoid positive or negative infinite values.

  • Do not pass columns that are entirely missing.

Missing values

Numeric missing values are filled with column medians during fit. This is intended as a convenience, not as a substitute for thoughtful preprocessing. Columns with only missing values cannot be imputed and raise a ValueError.

Dummy variables

Boolean dummy columns are accepted and converted to numeric 0 and 1 values. Other categorical columns, including strings and object columns, raise a TypeError with the column names that need preprocessing.

Integer-like variables

Synthetic samples are generated as continuous numeric values. If a variable should be integer-like, call round_integers after fitting:

resampler.round_integers(["age", "children"], stochastic=True)