https://github.com/framebuffers/mindhunter
Wrappers for Pandas DataFrames to add quicker access for common statistical values, utilities and functionality.
https://github.com/framebuffers/mindhunter
data-analysis data-science numpy pandas python utilities-python
Last synced: 4 months ago
JSON representation
Wrappers for Pandas DataFrames to add quicker access for common statistical values, utilities and functionality.
- Host: GitHub
- URL: https://github.com/framebuffers/mindhunter
- Owner: Framebuffers
- License: agpl-3.0
- Created: 2025-09-20T18:05:04.000Z (4 months ago)
- Default Branch: master
- Last Pushed: 2025-09-29T23:08:48.000Z (4 months ago)
- Last Synced: 2025-09-30T01:10:00.859Z (4 months ago)
- Topics: data-analysis, data-science, numpy, pandas, python, utilities-python
- Language: Python
- Homepage:
- Size: 34.2 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README

# ๐ฏ mindhunter
Extensions for DataFrames to make statistical and analysis operations much, *much* more comfortable and convenient. Turns your `DataFrame` into a `StatFrame`, composing Mindhunter's new features *over* it, supercharging its capabilities without sacrificing compatibility.
Example:
```python
import pandas as pd
from mindhunter import StatFrame
from mindhunter.visualization import StatPlotter
dataset = pd.read_csv('Fish.csv') # load your data
data = StatFrame(dataset) # create a StatFrame
data.clean_df() # clean your data
plottable = StatPlotter(data) # turn your StatFrame into a StatPlotter
plottable.plot_normal_distr(data_to_test=data.df['width']) # create a set of normal distribution validation graphs
```

---
## ๐ฆ Installation
### ๐๏ธ From the repo:
You need `uv` to build the module.
- Clone the repository
- `chmod +x ./build.sh`
- `./build.sh`
- It will clear cache, build, install and test the module.
## ๐งช Testing
Mindhunter implements a fairly rudimentary setup for testing. It will look inside `tests` for any fixtures or tests inside files starting with `test_`. It uses `pytest` and `faker` to create a randomised dataset to test upon.
So far, coverage goes to the extent of making sure a `StatFrame` can be created and data can be obtained. More testing is being developed and it's coming soon.
## ๐ Features
### ๐ Meet `StatFrame` and the crew
- Your new `StatFrame` can be used now with Mindhunter's new **Analyzers, Plotters and Toolkits:**
- `DistributionAnalyzer`: adds normal distribution utilities directly on top of the `DataFrame`.
- `HypothesisAnalyzer`: adds hypothesis testing, binomial and related functionality.
- `AnalyticalTools`: provides access to `scipy.stats` methods to generate and convert several values over a given `StatFrame`.
- `StatPlotter`: adds ready-to-go plotting capabilities for many common values, like z-scores, Coefficient of Variation, Normal Distribution, and others; using `seaborn` and `matplotlib.pyplot`.
- `StatVisualizer`: provides easy access to build common graphs and visualizations, returning ready-to-go graphs just by passing lists or a `StatFrame`.
### ๐พ Quick stats and cached values
- `StatFrame` also holds a cache of the most commonly-used values and variables, providing easy access to the values of not just a column, but of a whole set. It caches:
- **Central Tendency:**
- mean
- median
- mode
- **Spread/Variability:**
- std (standard deviation)
- variance
- range
- iqr (inter-quantile range)
- mad (median absolute deviation)
- **Distribution Shape:**
- skewness
- kurtosis
- **Data Quality:**
- count
- missing_count
- missing_pct
- **Extreme Values:**
- min
- max
- q1
- q3
- **Key Ratios:**
- cv (coefficient of variation)
- sem (standard error of mean)
### ๐งน Auto-cleanup:
- Mindhunter can also **automatically cleans column names, drops NaN and duplicates** of datasets. It also provides methods to **locate, analyze and remove zero-values** from your dataset.
---
## โน๏ธ But, why?
I've been studying data analysis and, over the months, I've been collecting a bunch of little methods and scripts to do my homework. It then went to the point it was a 800+ line cell on each Jupyter Notebook. It became a *bit* too much.
### ๐๏ธ How does it work on the inside:
In short: it uses basic OOP **composition**, against all advise, to pass the `StatFrame` as an argument. That class holds the `DataFrame` itself, and all operations are done through the `StatFrame` directly to the DF. All operations act directly on the source, and calling `update()` will re-trigger the caching process.
### ๐ฎ So, what's the future?
This library will be updated fairly regularly, as I start collecting and tidying up more and more little tools, and taking more advantage of the internal mechanisms. I am *much* more of a developer than a data analyst, so I need much more help knowing what the community *needs* for me to keep on improving the library. If you have any issue, suggestion or comment, feel free to create a new issue!