https://github.com/jabulente/kruskall-wallis-test

This repository contain project that provides a reusable Python function to perform the Kruskal-Wallis H-test across multiple continuous variables, grouped by a categorical feature
https://github.com/jabulente/kruskall-wallis-test

data-analysis data-science eda hypothesis-tests kruskal-wallis kruskals-algorithm scipy-stats statistics

Last synced: 3 months ago
JSON representation

This repository contain project that provides a reusable Python function to perform the Kruskal-Wallis H-test across multiple continuous variables, grouped by a categorical feature

Host: GitHub
URL: https://github.com/jabulente/kruskall-wallis-test
Owner: Jabulente
License: mit
Created: 2025-07-16T09:42:44.000Z (3 months ago)
Default Branch: main
Last Pushed: 2025-07-16T13:08:14.000Z (3 months ago)
Last Synced: 2025-07-17T13:32:08.757Z (3 months ago)
Topics: data-analysis, data-science, eda, hypothesis-tests, kruskal-wallis, kruskals-algorithm, scipy-stats, statistics
Language: Jupyter Notebook
Homepage:
Size: 36.1 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          
 Kruskal-Wallis Test for Multiple Variables and Group Comparisons


This project provides a reusable Python function to perform the **Kruskal-Wallis H-test** across **multiple continuous variables**, grouped by a categorical feature. It returns a **clean summary DataFrame** with test statistics, p-values, and significance indicators, making it easy to evaluate whether medians differ significantly between groups for each variable.

## 1. Purpose

The Kruskal-Wallis test is a **non-parametric alternative to one-way ANOVA**. It is used when:

- You have **3 or more independent groups**

- Your data **violates ANOVA assumptions** (e.g., normality, homogeneity of variance)

- Your data is **ordinal or continuous but not normally distributed**

This script simplifies applying this test across **many variables at once**, saving time and boosting productivity in exploratory and inferential analysis.

## 2. Features

- Accepts a pandas DataFrame and a grouping column

- Automatically applies the Kruskal-Wallis test to all other numeric variables

- Outputs a summary table including:

  - Variable name

  - Kruskal-Wallis test statistic

  - p-value

  - Significance status (`p < 0.05`)

- Ready for integration in statistical reports or dashboards

##  3. How to Use

```python

from scipy.stats import kruskal

import pandas as pd

import numpy as np

def kruskall_wallis(df, group_columns: str, numerical_columns: list = None):

    if numerical_columns is None:

        numerical_columns = df.select_dtypes(include=[np.number]).columns.tolist()

        for g in group_columns:

            if g in numerical_columns:

                numerical_columns.remove(g)

    results = []

    for group_column in group_columns:

        for column in numerical_columns:

            # Create a list of samples grouped by group_column

            groups = [group[column].dropna().values for name, group in df.groupby(group_column)]

            stats, p_value = kruskal(*groups)

            interpretation = '✔' if p_value < 0.05 else '✖'

            results.append({

                'Group': group_column,

                'Variables': column,

                'Kruskal-Wallis Statistic': stats,

                'P-value': p_value,

                'Significant (α<0.05)': interpretation

            })

    return pd.DataFrame(results)

```

## 4. 📂 Example Dataset

```

df = pd.DataFrame({

    'Group 1': ['Ashura', 'Ashura', 'Ashura', 'Barack', 'Barack', 'Barack', 'Colins', 'Colins', 'Colins'],

    'Group 2': ['Orenge', 'Orenge', 'Orenge', 'Banana', 'Banana', 'Banana', 'Carott', 'Carott', 'Carott'],

    'Group 3': ['Alpha', 'Alpha', 'Alpha', 'Bravo', 'Bravo', 'Bravo', 'Eagle', 'Eagle', 'Eagle'],

    'Variable 1': [12, 14, 13, 15, 16, 14, 10, 9, 11],

    'Variable 2': [7, 6, 7, 8, 9, 10, 5, 6, 5],

    'Variable 3': [20, 21, 19, 23, 22, 21, 18, 17, 19],

    'Variable 4': [124, 145, 137, 150, 163, 148, 180, 90, 111],

    'Variable 5': [70, 66, 75, 80, 92, 100, 56, 64, 56],

    'Variable 6': [2, 2, 1, 2, 2, 2, 1, 1, 1]

})

groups_column = ['Group 1', 'Group 2', 'Group 3']

results = kruskal_test_all_variables(df, 'Group')

print(results)

```

##  5. Sample Output

Variable	Kruskal-Wallis Statistic	p-value	Significant (p<0.05)

|    | Group   | Variables 
|---:|:--------|:------------| 
|  0 | Group 1 | Variable 1  | 
|  1 | Group 1 | Variable 2  | 
|  2 | Group 1 | Variable 3  | 
|  3 | Group 1 | Variable 4  | 
|  4 | Group 1 | Variable 5  | 
|  5 | Group 1 | Variable 6  | 
|  6 | Group 2 | Variable 1  | 
|  7 | Group 2 | Variable 2  | 
|  8 | Group 2 | Variable 3  | 
|  9 | Group 2 | Variable 4  | 
| 10 | Group 2 | Variable 5  | 
| 11 | Group 2 | Variable 6  | 
| 12 | Group 3 | Variable 1  | 
| 13 | Group 3 | Variable 2  | 
| 14 | Group 3 | Variable 3  | 
| 15 | Group 3 | Variable 4  | 
| 16 | Group 3 | Variable 5  | 
| 17 | Group 3 | Variable 6  |

|   Kruskal-Wallis Statistic |   P-value | Significant (α<0.05)   | ---------------------------:|----------:|:-----------------------| 6.88  |     0.032 | ✔                      | 6.997 |     0.03  | ✔                      | 6.531 |     0.038 | ✔                      | 2.4   |     0.301 | ✖                      | 7.261 |     0.027 | ✔                      | 5.6   |     0.061 | ✖                      | 6.88  |     0.032 | ✔                      | 6.997 |     0.03  | ✔                      | 6.531 |     0.038 | ✔                      | 2.4   |     0.301 | ✖                      | 7.261 |     0.027 | ✔                      | 5.6   |     0.061 | ✖                      | 6.88  |     0.032 | ✔                      | 6.997 |     0.03  | ✔                      | 6.531 |     0.038 | ✔                      | 2.4   |     0.301 | ✖                      | 7.261 |     0.027 | ✔                      | 5.6   |     0.061 | ✖                      |

# 6. 📈 Applications

- Agricultural and biological experiments

- Social science and behavioral research

- Market and product group analysis

- Education and clinical trial comparisons

## 7. 📥 Requirements

- Python 3.x

- pandas

- scipy

```

pip install pandas scipy

```

## 8. 🤝 Contributing

Feel free to fork this repo, contribute improvements, or suggest additional features such as post-hoc Dunn tests, visualization tools, or effect size calculation.

⭐️ If You Found This Useful

- Leave a star 🌟

- Share feedback

- Fork and adapt for your project

- Mention the project in your work!

## 9. License

MIT License

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jabulente/kruskall-wallis-test

Awesome Lists containing this project

README

Kruskal-Wallis Test for Multiple Variables and Group Comparisons