Preprocessing Methods
cleaning
'remove_col'
Removes the specified column.
'remove_nans'
Removes rows with NaN values.
'remove_outliers'
for continuous variable onlyRemoves data points classified as outliers using IQR method.
replace_nans
'median'
for continuous variable onlyReplace NaN values with the median of the column.
'mean'
for continuous variable onlyReplace NaN values with the mean of the column.
'most_frequent'
Replace NaN values with the most frequent value.
{'value': VALUE}
Replace NaN values with the given VALUE.
scaling
'min_max'
for continuous variable onlySubtract the column by the min value and divide by (max - min).
'abs_max'
for continuous variable onlyDivide the column by the absolute value of the max.
'standard'
for continuous variable onlyStandardize the column (mean=0, std=1).
'robust'
for continuous variable onlySubtract the column by the median and divide by IQR.
encoding
'binary'
for categorical variable onlyEncodes categorical variables using binary encoding.
'one_hot'
for categorical variable onlyEncodes categorical variables using one-hot encoding.
'ordinal'
for categorical variable onlyEncodes categorical variables using ordinal encoding.