Preprocessing Methods
cleaning
'remove_col'Removes the specified column.
'remove_nans'Removes rows with NaN values.
'remove_outliers' for continuous variable onlyRemoves data points classified as outliers using IQR method.
replace_nans
'median' for continuous variable onlyReplace NaN values with the median of the column.
'mean' for continuous variable onlyReplace NaN values with the mean of the column.
'most_frequent'Replace NaN values with the most frequent value.
{'value': VALUE}Replace NaN values with the given VALUE.
scaling
'min_max' for continuous variable onlySubtract the column by the min value and divide by (max - min).
'abs_max' for continuous variable onlyDivide the column by the absolute value of the max.
'standard' for continuous variable onlyStandardize the column (mean=0, std=1).
'robust' for continuous variable onlySubtract the column by the median and divide by IQR.
encoding
'binary' for categorical variable onlyEncodes categorical variables using binary encoding.
'one_hot' for categorical variable onlyEncodes categorical variables using one-hot encoding.
'ordinal' for categorical variable onlyEncodes categorical variables using ordinal encoding.