{"id":31665173,"url":"https://github.com/framebuffers/mindhunter","last_synced_at":"2026-05-08T19:32:07.405Z","repository":{"id":315800600,"uuid":"1060849114","full_name":"Framebuffers/mindhunter","owner":"Framebuffers","description":"Wrappers for Pandas DataFrames to add quicker access for common statistical values, utilities and functionality.","archived":false,"fork":false,"pushed_at":"2025-09-29T23:08:48.000Z","size":35,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-09-30T01:10:00.859Z","etag":null,"topics":["data-analysis","data-science","numpy","pandas","python","utilities-python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Framebuffers.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-20T18:05:04.000Z","updated_at":"2025-09-29T23:08:51.000Z","dependencies_parsed_at":"2025-09-20T21:08:27.465Z","dependency_job_id":"764684a7-c0f7-4361-8d33-010c32619faa","html_url":"https://github.com/Framebuffers/mindhunter","commit_stats":null,"previous_names":["framebuffers/bloodhound-utils","framebuffers/mindhunter"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/Framebuffers/mindhunter","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Framebuffers%2Fmindhunter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Framebuffers%2Fmindhunter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Framebuffers%2Fmindhunter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Framebuffers%2Fmindhunter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Framebuffers","download_url":"https://codeload.github.com/Framebuffers/mindhunter/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Framebuffers%2Fmindhunter/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278848406,"owners_count":26056508,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-07T02:00:06.786Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-analysis","data-science","numpy","pandas","python","utilities-python"],"created_at":"2025-10-07T21:15:59.479Z","updated_at":"2025-10-07T21:16:02.800Z","avatar_url":"https://github.com/Framebuffers.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cimg width=\"550\" height=\"188\" alt=\"mindhunter-header\" src=\"https://github.com/user-attachments/assets/47fbbe27-251b-4961-80dc-809c73020d10\" /\u003e\r\n\r\n# 🐯 mindhunter\r\n\r\nExtensions for DataFrames to make statistical and analysis operations much, *much* more comfortable and convenient. Turns your `DataFrame` into a `StatFrame`, composing Mindhunter's new features *over* it, supercharging its capabilities without sacrificing compatibility. \r\n\r\nExample:\r\n\r\n```python\r\nimport pandas as pd\r\n\r\nfrom mindhunter import StatFrame\r\nfrom mindhunter.visualization import StatPlotter\r\n\r\ndataset = pd.read_csv('Fish.csv')                            # load your data\r\ndata = StatFrame(dataset)                                    # create a StatFrame\r\ndata.clean_df()                                              # clean your data\r\nplottable = StatPlotter(data)                                # turn your StatFrame into a StatPlotter\r\nplottable.plot_normal_distr(data_to_test=data.df['width'])   # create a set of normal distribution validation graphs\r\n```\r\n\r\n\u003cimg width=\"1242\" height=\"1107\" alt=\"fish_nd\" src=\"https://github.com/user-attachments/assets/bba2091e-186a-4a23-9e6e-3554a4460b19\" /\u003e\r\n\r\n---\r\n\r\n## 📦 Installation\r\n\r\n### 🗃️ From the repo:\r\nYou need `uv` to build the module.\r\n\r\n- Clone the repository\r\n- `chmod +x ./build.sh`\r\n- `./build.sh`\r\n  - It will clear cache, build, install and test the module.\r\n\r\n\r\n## 🧪 Testing\r\nMindhunter implements a fairly rudimentary setup for testing. It will look inside `tests` for any fixtures or tests inside files starting with `test_`. It uses `pytest` and `faker` to create a randomised dataset to test upon.\r\n\r\nSo far, coverage goes to the extent of making sure a `StatFrame` can be created and data can be obtained. More testing is being developed and it's coming soon.\r\n\r\n\r\n## 📝 Features\r\n\r\n### 📋 Meet `StatFrame` and the crew\r\n\r\n- Your new `StatFrame` can be used now with Mindhunter's new **Analyzers, Plotters and Toolkits:**\r\n  - `DistributionAnalyzer`: adds normal distribution utilities directly on top of the `DataFrame`.\r\n  - `HypothesisAnalyzer`: adds hypothesis testing, binomial and related functionality.\r\n  - `AnalyticalTools`: provides access to `scipy.stats` methods to generate and convert several values over a given `StatFrame`.\r\n  - `StatPlotter`: adds ready-to-go plotting capabilities for many common values, like z-scores, Coefficient of Variation, Normal Distribution, and others; using `seaborn` and `matplotlib.pyplot`.\r\n  - `StatVisualizer`: provides easy access to build common graphs and visualizations, returning ready-to-go graphs just by passing lists or a `StatFrame`.\r\n\r\n### 💾 Quick stats and cached values\r\n- `StatFrame` also holds a cache of the most commonly-used values and variables, providing easy access to the values of not just a column, but of a whole set. It caches:\r\n- **Central Tendency:**\r\n  - mean\r\n  - median\r\n  - mode\r\n- **Spread/Variability:**\r\n  - std (standard deviation)\r\n  - variance\r\n  - range\r\n  - iqr (inter-quantile range)\r\n  - mad (median absolute deviation)\r\n- **Distribution Shape:**\r\n  - skewness\r\n  - kurtosis\r\n- **Data Quality:**\r\n  - count\r\n  - missing_count\r\n  - missing_pct\r\n- **Extreme Values:**\r\n  - min\r\n  - max\r\n  - q1\r\n  - q3\r\n- **Key Ratios:**\r\n  - cv (coefficient of variation)\r\n  - sem (standard error of mean)\r\n\r\n### 🧹 Auto-cleanup:\r\n- Mindhunter can also **automatically cleans column names, drops NaN and duplicates** of datasets. It also provides methods to **locate, analyze and remove zero-values** from your dataset.\r\n\r\n---\r\n\r\n## ℹ️ But, why?\r\n\r\nI've been studying data analysis and, over the months, I've been collecting a bunch of little methods and scripts to do my homework. It then went to the point it was a 800+ line cell on each Jupyter Notebook. It became a *bit* too much. \r\n\r\n### 🏗️ How does it work on the inside:\r\n\r\nIn short: it uses basic OOP **composition**, against all advise, to pass the `StatFrame` as an argument. That class holds the `DataFrame` itself, and all operations are done through the `StatFrame` directly to the DF. All operations act directly on the source, and calling `update()` will re-trigger the caching process.\r\n\r\n### 🔮 So, what's the future?\r\n\r\n\r\nThis library will be updated fairly regularly, as I start collecting and tidying up more and more little tools, and taking more advantage of the internal mechanisms. I am *much* more of a developer than a data analyst, so I need much more help knowing what the community *needs* for me to keep on improving the library. If you have any issue, suggestion or comment, feel free to create a new issue!\r\n\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fframebuffers%2Fmindhunter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fframebuffers%2Fmindhunter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fframebuffers%2Fmindhunter/lists"}