Preprocessing Methods

cleaning
  • 'remove_col'Removes the specified column.
  • 'remove_nans'Removes rows with NaN values.
  • 'remove_outliers' for continuous variable onlyRemoves data points classified as outliers using IQR method.
replace_nans
  • 'median' for continuous variable onlyReplace NaN values with the median of the column.
  • 'mean' for continuous variable onlyReplace NaN values with the mean of the column.
  • 'most_frequent'Replace NaN values with the most frequent value.
  • {'value': VALUE}Replace NaN values with the given VALUE.
scaling
  • 'min_max' for continuous variable onlySubtract the column by the min value and divide by (max - min).
  • 'abs_max' for continuous variable onlyDivide the column by the absolute value of the max.
  • 'standard' for continuous variable onlyStandardize the column (mean=0, std=1).
  • 'robust' for continuous variable onlySubtract the column by the median and divide by IQR.
encoding
  • 'binary' for categorical variable onlyEncodes categorical variables using binary encoding.
  • 'one_hot' for categorical variable onlyEncodes categorical variables using one-hot encoding.
  • 'ordinal' for categorical variable onlyEncodes categorical variables using ordinal encoding.
Back to Home