{"id":31555921,"url":"https://github.com/mrtarantoga/mdl-density-histogram","last_synced_at":"2025-11-30T15:04:58.322Z","repository":{"id":307305549,"uuid":"1029056349","full_name":"MrTarantoga/MDL-Density-Histogram","owner":"MrTarantoga","description":"Cython-accelerated MDL histogram density estimation. Implements Kontkanen \u0026 Myllymaki's dynamic programming algorithm for optimal variable-width bins, parametric complexity with Ramanujan approximations, and automatic bin count selection. Based on \"MDL Histogram Density Estimation\" (JMLR 2007).","archived":false,"fork":false,"pushed_at":"2025-09-18T12:19:59.000Z","size":128,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-26T04:57:02.300Z","etag":null,"topics":["cython-library","mdl","numpy","pypi-package","python3"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/MDL-Density-Histogram/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MrTarantoga.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-07-30T13:07:03.000Z","updated_at":"2025-09-02T16:41:11.000Z","dependencies_parsed_at":"2025-07-30T16:15:43.863Z","dependency_job_id":"b753a8e4-5f14-4743-a019-bbc56cb9b682","html_url":"https://github.com/MrTarantoga/MDL-Density-Histogram","commit_stats":null,"previous_names":["mrtarantoga/mdl-density-estimation"],"tags_count":12,"template":false,"template_full_name":null,"purl":"pkg:github/MrTarantoga/MDL-Density-Histogram","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MrTarantoga%2FMDL-Density-Histogram","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MrTarantoga%2FMDL-Density-Histogram/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MrTarantoga%2FMDL-Density-Histogram/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MrTarantoga%2FMDL-Density-Histogram/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MrTarantoga","download_url":"https://codeload.github.com/MrTarantoga/MDL-Density-Histogram/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MrTarantoga%2FMDL-Density-Histogram/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278381894,"owners_count":25977449,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-04T02:00:05.491Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cython-library","mdl","numpy","pypi-package","python3"],"created_at":"2025-10-04T22:15:10.865Z","updated_at":"2025-11-30T15:04:58.317Z","avatar_url":"https://github.com/MrTarantoga.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Upload Python Package](https://github.com/MrTarantoga/MDL-Density-Histogram/actions/workflows/python-publish.yml/badge.svg?event=release)](https://github.com/MrTarantoga/MDL-Density-Histogram/actions/workflows/python-publish.yml)\n[![Python application test](https://github.com/MrTarantoga/MDL-Density-Histogram/actions/workflows/python-app.yml/badge.svg)](https://github.com/MrTarantoga/MDL-Density-Histogram/actions/workflows/python-app.yml)\n\n# MDL Optimal Histogram Density Estimation\n\nThis package provides a Cython-accelerated implementation of the **Minimum Description Length (MDL) optimal histogram density estimation** algorithm from Kontkanen \u0026 Myllymaki (2007). It uses information-theoretic principles to automatically determine optimal variable-width bins for density estimation.\n\n![Freedman-Diaconis vs. MDL-Optimization](https://raw.githubusercontent.com/MrTarantoga/MDL-Density-Histogram/main/gmm5_idx_3.png)\n\n## Features\n- **MDL Principle**: Uses stochastic complexity for model selection\n- **Dynamic Programming**: Efficient O(E²·K_max) optimization (cache parametric complexity computation, speed up)\n- **Score of each *K*th bin**: The score of each bin is returned to understand the performance of different properties of the same dataset.\n- **Variable-Width Bins**: Adapts to data density variations\n- **Automatic Bin Count**: No manual parameter tuning required (except maximum bin count to consider $K_{max}$ and data resolution $\\epsilon$)\n- **Cython Acceleration**: Critical operations compiled to C\n\n## Installation\nYou can install the package using pip:\n```bash\npip install MDL-Density-Histogram\n```\nAlternatively, you can install it from source by cloning the repository and running:\n```bash\n# From project root directory\npip install .\n```\n\nRequires:\n- Python 3.11+\n- NumPy\n- Cython\n- C compiler (GCC/Clang/MSVC)\n\n## Usage Example\n```python\nimport numpy as np\nfrom mdl_density_hist import mdl_optimal_histogram\n\n# Generate sample data\ndata = np.random.normal(0, 1, 1000)\n\n# Compute optimal histogram\ncut_points, K_scores = mdl_optimal_histogram(data, epsilon=0.1)\n\n# Print score of each bin\nprint(f\"K_scores: {K_scores}\")\n\n# Visualize result\nimport matplotlib.pyplot as plt\nplt.hist(data, bins=cut_points, density=True)\nplt.title('MDL Optimal Histogram')\nplt.show()\n```\n\n## Parameters\n- `data`: Input array (1D numpy array)\n- `epsilon`: Quantization precision (default: 0.1)\n- `K_max`: Maximum number of bins (default: 10)\n\n## Algorithm Highlights\n- Uses **Ramanujan's factorial approximation** for efficient parametric complexity\n- Cache parameteric complexity to speed up computation\n\n## Paper Citation\nKontkanen, P., \u0026 Myllymäki, P. (2007).  \n*MDL Histogram Density Estimation*  \nJournal of Machine Learning Research 8 (2007)\n[PDF](https://proceedings.mlr.press/v2/kontkanen07a/kontkanen07a.pdf)\n\n## License\nApache 2.0 License - See LICENSE file\n\n## Project Structure\n```\nsrc/\n├── mdl_density_hist/\n│   ├── __init__.py\n│   └── mdl_hist.pyx  # Core Cython implementation\n└── pyproject.toml\n```\n\n## Performance Notes\n- Precomputed parametric complexity using dynamic programming\n- Memory-optimized array operations via NumPy\n- Candidate cut point pruning for reduced search space\n\n\nFor implementation details, see the [paper](https://proceedings.mlr.press/v2/kontkanen07a/kontkanen07a.pdf) and inline code comments.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmrtarantoga%2Fmdl-density-histogram","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmrtarantoga%2Fmdl-density-histogram","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmrtarantoga%2Fmdl-density-histogram/lists"}