Dealing with imbalanced data with Python

Oftentimes, a target categorical variable might be severely imbalanced, which makes a mess of modeling.

With an imbalanced dataset, simply predicting the majority class yields a solid accuracy, but fails to capture minority classes.

Number of different ways to get around this.

Up-Sample the Minority Class #

Randomly duplicate records from the minority class. This can be done by sampling with replacement.

from sklearn.utils import resample

Down-Sample Majority Class #

Resources #