{"id":37071664,"url":"https://github.com/delusionary/histoptimizer","last_synced_at":"2026-01-14T08:24:33.400Z","repository":{"id":63780336,"uuid":"326933538","full_name":"delusionary/histoptimizer","owner":"delusionary","description":"Solves a minimum variance cost of the partition problem.","archived":false,"fork":false,"pushed_at":"2023-02-28T07:30:25.000Z","size":35070,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-03T18:56:29.056Z","etag":null,"topics":["cuda","numba","python"],"latest_commit_sha":null,"homepage":"http://histoptimizer.org","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"0bsd","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/delusionary.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-01-05T08:23:09.000Z","updated_at":"2024-03-27T00:51:58.000Z","dependencies_parsed_at":"2023-02-07T20:00:39.537Z","dependency_job_id":null,"html_url":"https://github.com/delusionary/histoptimizer","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/delusionary/histoptimizer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/delusionary%2Fhistoptimizer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/delusionary%2Fhistoptimizer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/delusionary%2Fhistoptimizer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/delusionary%2Fhistoptimizer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/delusionary","download_url":"https://codeload.github.com/delusionary/histoptimizer/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/delusionary%2Fhistoptimizer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28413888,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-14T08:16:59.381Z","status":"ssl_error","status_checked_at":"2026-01-14T08:13:45.490Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cuda","numba","python"],"created_at":"2026-01-14T08:24:31.647Z","updated_at":"2026-01-14T08:24:33.388Z","avatar_url":"https://github.com/delusionary.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![codecov](https://codecov.io/github/delusionary/histoptimizer/branch/main/graph/badge.svg?token=FCLW50JSR9)](https://codecov.io/github/delusionary/histoptimizer)\n\n# Histoptimizer\n\n\u003cimg src=\"docs/_static/histoptimizer-spirit-animal.png\" width=\"50%\"/\u003e\n\n## Overview\n\nHistoptimizer is a Python library and CLI that accepts\na DataFrame or ordered list of item sizes, and produces a list of \"divider\nlocations\" that partition the items as evenly as possible into a given number of\nbuckets, minimizing the variance and standard deviation between the bucket\nsizes. You can read detailed documentation at the project web site,\n[histoptimizer.org](https://histoptimizer.org).\n\nJIT compilation and GPU support through Numba provide great speed improvements\non supported hardware, enabling problem sets of a million items or more.\n\nHistoptimizer was built in order to divide the counties of the US precisely\ninto intervals ordered by population density. That job was accomplished very\nearly on, and no other uses have been discovered. It is unclear why development\nhas continued to this point.\n\n## Usage\n\nHistoptimizer provides two APIs and two command-line tools:\n\n### NumPy array partitioner\n\nSeveral implementations of the partitioning algorithm can be called directly\nwith a list or array of item sizes and a number of buckets. They return an\narray of divider locations (dividers come _after_ the given item in 1-based\nindexing, or _before_ the given item in 0-based indexing) and the variance of\nthe given partition.\n\n```python\nfrom histoptimizer import Histoptimizer\n\nitem_sizes = [1.0, 4.5, 6.3, 2.1, 8.4, 3.7, 8.6, 0.3, 5.2, 6.9, 1.2, 2.4, 9.8, 3.7]\n\n# Get the optimal position of two dividers that partition the list above into 3 buckets.\n(dividers, variance) = Histoptimizer.partition(item_sizes, 3)\n\nprint(f\"Optimal Divider Locations: {dividers} Optimal solution variance: {variance:.4}\")\n```\n\n### Pandas Dataframe Partitioner\n\nYou can supply a Pandas DataFrame, the name of a size column, a list of bucket\nsizes, and a column prefix to get a version of the DataFrame with added columns\nwhere the value is the 1-based bucket number of the corresponding item \npartitioned into the number of buckets reflected in the column name.\n\n```python\nfrom histoptimizer import histoptimize\nimport pandas as pd\n\nbooks = pd.read_csv('books.csv', header=0)\ndivisions, column_names = histoptimize(books, \"Pages\", [3], \"assistant_\", Histoptimizer)\ndivisions\n```\n\n|     | Title                          |   Pages | assistant_3 |\n|----:|:-------------------------------|--------:|------------:|\n|   0 | The Algorithm Design Manual    |     748 |           1 |\n|   1 | Software Engineering at Google |     599 |           1 |\n|   2 | Site Reliability Engineering   |     550 |           2 |\n|  .. | ...                            |   ...   |         ... |\n|  14 | Noise                          |     464 |           3 |\n|  15 | Snow Crash                     |     440 |           3 |\n\n\n### CLI\n\nThe CLI is a wrapper around the DataFrame functionality that can accept and\nproduce either CSV or Pandas JSON files.\n\n```\nUsage: histoptimizer [OPTIONS] FILE SIZE_COLUMN PARTITIONS\n\n  Partition ordered items in a CSV into a given number of buckets,       \n  evenly.\n\n  Given a CSV or JSON Dataframe, a size column name, and a number of     \n  buckets, Histoptimizer will add a column which gives the partition     \n  number for each row that optimally divides the given items into the    \n  buckets so as to minimize the variance from mean of the summed items   \n  in each bucket.\n\n  Additional features allow doing a list of bucket sizes in one go,      \n  sorting items beforehand, and producing output with only relevant      \n  columns.\n\n  Example:\n\n      \u003e histoptimizer books.csv state_name population 10\n\n      Output:\n\n      state_name, population, partition_10     Wyoming, xxxxxx, 1        \n      California, xxxxxxxx, 10\n\nOptions:\n  -l, --limit INTEGER             Take the first {limit} records from    \n                                  the input, rather than the whole       \n                                  file.\n  -a, --ascending, --asc / -d, --descending, --desc\n                                  If a sort column is provided,\n  --print-all, --all / --no-print-all, --brief\n                                  Output all columns in input, or with   \n                                  --brief, only output the ID, size,     \n                                  and buckets columns.\n  -c, --column-prefix TEXT        Partition column name prefix. The      \n                                  number of buckets will be appended.    \n                                  Defaults to partion_{number of\n                                  buckets}.\n  -s, --sort-key TEXT             Optionally sort records by this        \n                                  column name before partitioning.       \n  -i, --id-column TEXT            Optional ID column to print with       \n                                  brief output.\n  -p, --partitioner TEXT          Use the named partitioner\n                                  implementation. Defaults to \"numba\".   \n                                  If you have an NVidia GPU use \"cuda\"   \n                                  for better performance\n  -o, --output FILENAME           Send output to the given file.\n                                  Defaults to stdout.\n  -f, --output-format [csv|json]  Specify output format. Pandas JSON or  \n                                  CSV. Defaults to CSV\n  --help                          Show this message and exit.\n```\n\n### Benchmarking CLI\n\nThe Benchmarking CLI can be used to produce comparative performance metrics for \nvarious implementations of the algorithm.\n\n```\nUsage: histobench [OPTIONS] PARTITIONER_TYPES [ITEM_SPEC] [BUCKET_SPEC]\n                  [ITERATIONS] [SIZE_SPEC]\n\n  Histobench is a benchmarking harness for testing Histoptimizer partitioner\n  performance.\n\n  By Default it uses random data, and so may not be an accurate benchmark for\n  algorithms whose performance depends upon the data set.\n\n  The PARTITIONER_TYPES parameter is a comma-separated list of partitioners to\n  benchmark, which can be specified as either:\n\n  1. A standard optimizer name, or 2. filepath:classname\n\n  To specify the standard cuda module and also a custom variant, for example,\n\nOptions:\n  --debug-info / --no-debug-info\n  --force-jit / --no-force-jit\n  --report PATH\n  --sizes-from PATH\n  --tables / --no-tables\n  --verbose / --no-verbose\n  --help                          Show this message and exit.\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdelusionary%2Fhistoptimizer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdelusionary%2Fhistoptimizer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdelusionary%2Fhistoptimizer/lists"}