Mann-Whitney U Test in Python

Let’s say you have a dataset stored as a Pandas dataframe, df, with a numerical column, and another categorical column, and you want to compare the categories to determine if those different categories are statistically different from one another.

A Mann-Whitney U test might be appropriate, especially if the fundamental assumptions for the more convention t-test are not met (e.g., variance across the groups are similar, disributions are mostly normal)

A minimum viable code snippet to perform this test would look like:

import pandas as pd
import numpy as np
import scipy.stats as stats


group1_array = df.loc[df['group'] == "<group_1>"]["<numerical_values>"].to_numpy()
group2_array = df.loc[df['group'] == "<group_2>"]["<numerical_values>"].to_numpy()

mwtest_results = stats.mannwhitneyu(x=fraud_array, y=not_fraud_array, alternative = 'two-sided')

This returns a stastics value and a p-value.

if there are NA values, you may want to toss in a .dropna() in the array preparation.