{"id":13418248,"url":"https://github.com/astrofrog/fast-histogram","last_synced_at":"2025-05-15T04:04:23.290Z","repository":{"id":22874549,"uuid":"97506132","full_name":"astrofrog/fast-histogram","owner":"astrofrog","description":" :zap: Fast 1D and 2D histogram functions in Python :zap:","archived":false,"fork":false,"pushed_at":"2025-03-03T17:39:35.000Z","size":1345,"stargazers_count":271,"open_issues_count":18,"forks_count":28,"subscribers_count":18,"default_branch":"main","last_synced_at":"2025-05-15T04:03:22.906Z","etag":null,"topics":["histogram","performance","python"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/astrofrog.png","metadata":{"files":{"readme":"README.rst","changelog":"CHANGES.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2017-07-17T17:54:57.000Z","updated_at":"2025-05-14T20:48:31.000Z","dependencies_parsed_at":"2023-12-18T19:29:00.160Z","dependency_job_id":"8f246fb9-0a32-4c87-b18d-966bf2cbde1e","html_url":"https://github.com/astrofrog/fast-histogram","commit_stats":{"total_commits":179,"total_committers":9,"mean_commits":19.88888888888889,"dds":0.1955307262569832,"last_synced_commit":"3a9814819cd7b3d6dfb8c962e4f59d6cb59560a1"},"previous_names":[],"tags_count":15,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/astrofrog%2Ffast-histogram","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/astrofrog%2Ffast-histogram/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/astrofrog%2Ffast-histogram/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/astrofrog%2Ffast-histogram/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/astrofrog","download_url":"https://codeload.github.com/astrofrog/fast-histogram/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254270641,"owners_count":22042858,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["histogram","performance","python"],"created_at":"2024-07-30T22:01:00.106Z","updated_at":"2025-05-15T04:04:23.264Z","avatar_url":"https://github.com/astrofrog.png","language":"C","funding_links":[],"categories":["C","Uncategorized"],"sub_categories":["Uncategorized"],"readme":"|CI Status| |asv| |PyPI|\n\nAbout\n-----\n\nSometimes you just want to compute simple 1D or 2D histograms with regular bins. Fast. No\nnonsense. `Numpy's \u003chttp://www.numpy.org\u003e`__ histogram functions are\nversatile, and can handle for example non-regular binning, but this\nversatility comes at the expense of performance.\n\nThe **fast-histogram** mini-package aims to provide simple and fast\nhistogram functions for regular bins that don't compromise on performance. It doesn't do\nanything complicated - it just implements a simple histogram algorithm\nin C and keeps it simple. The aim is to have functions that are fast but\nalso robust and reliable. The result is a 1D histogram function here that\nis **7-15x faster** than ``numpy.histogram``, and a 2D histogram function\nthat is **20-25x faster** than ``numpy.histogram2d``.\n\nTo install::\n\n    pip install fast-histogram\n\nor if you use conda you can instead do::\n\n    conda install -c conda-forge fast-histogram\n\nThe ``fast_histogram`` module then provides two functions:\n``histogram1d`` and ``histogram2d``:\n\n.. code:: python\n\n    from fast_histogram import histogram1d, histogram2d\n\nExample\n-------\n\nHere's an example of binning 10 million points into a regular 2D\nhistogram:\n\n.. code:: python\n\n    In [1]: import numpy as np\n\n    In [2]: x = np.random.random(10_000_000)\n\n    In [3]: y = np.random.random(10_000_000)\n\n    In [4]: %timeit _ = np.histogram2d(x, y, range=[[-1, 2], [-2, 4]], bins=30)\n    935 ms ± 58.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n\n    In [5]: from fast_histogram import histogram2d\n\n    In [6]: %timeit _ = histogram2d(x, y, range=[[-1, 2], [-2, 4]], bins=30)\n    40.2 ms ± 624 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n\n(note that ``10_000_000`` is possible in Python 3.6 syntax, use ``10000000`` instead in previous versions)\n\nThe version here is over 20 times faster! The following plot shows the\nspeedup as a function of array size for the bin parameters shown above:\n\n.. figure:: https://github.com/astrofrog/fast-histogram/raw/main/speedup_compared.png\n   :alt: Comparison of performance between Numpy and fast-histogram\n\nas well as results for the 1D case, also with 30 bins. The speedup for\nthe 2D case is consistently between 20-25x, and for the 1D case goes\nfrom 15x for small arrays to around 7x for large arrays.\n\nQ\u0026A\n---\n\nWhy don't the histogram functions return the edges?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nComputing and returning the edges may seem trivial but it can slow things down by a factor of a few when computing histograms of 10^5 or fewer elements, so not returning the edges is a deliberate decision related to performance. You can easily compute the edges yourself if needed though, using ``numpy.linspace``.\n\nDoesn't package X already do this, but better?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThis may very well be the case! If this duplicates another package, or\nif it is possible to use Numpy in a smarter way to get the same\nperformance gains, please open an issue and I'll consider deprecating\nthis package :)\n\nOne package that does include fast histogram functions (including in\nn-dimensions) and can compute other statistics is\n`vaex \u003chttps://github.com/maartenbreddels/vaex\u003e`_, so take a look there\nif you need more advanced functionality!\n\nAre the 2D histograms not transposed compared to what they should be?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThere is technically no 'right' and 'wrong' orientation - here we adopt\nthe convention which gives results consistent with Numpy, so:\n\n.. code:: python\n\n    numpy.histogram2d(x, y, range=[[xmin, xmax], [ymin, ymax]], bins=[nx, ny])\n\nshould give the same result as:\n\n.. code:: python\n\n    fast_histogram.histogram2d(x, y, range=[[xmin, xmax], [ymin, ymax]], bins=[nx, ny])\n\nWhy not contribute this to Numpy directly?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAs mentioned above, the Numpy functions are much more versatile, so they could not\nbe replaced by the ones here. One option would be to check in Numpy's functions for\ncases that are simple and dispatch to functions such as the ones here, or add\ndedicated functions for regular binning. I hope we can get this in Numpy in some form\nor another eventually, but for now, the aim is to have this available to packages\nthat need to support a range of Numpy versions.\n\nWhy not use Cython?\n~~~~~~~~~~~~~~~~~~~\n\nI originally implemented this in Cython, but found that I could get a\n50% performance improvement by going straight to a C extension.\n\nWhat about using Numba?\n~~~~~~~~~~~~~~~~~~~~~~~\n\nI specifically want to keep this package as easy as possible to install,\nand while `Numba \u003chttps://numba.pydata.org\u003e`__ is a great package, it is\nnot trivial to install outside of Anaconda.\n\nCould this be parallelized?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThis may benefit from parallelization under certain circumstances. The\neasiest solution might be to use OpenMP, but this won't work on all\nplatforms, so it would need to be made optional.\n\nCouldn't you make it faster by using the GPU?\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nAlmost certainly, though the aim here is to have an easily installable\nand portable package, and introducing GPUs is going to affect both of\nthese.\n\nWhy make a package specifically for this? This is a tiny amount of functionality\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nPackages that need this could simply bundle their own C extension or\nCython code to do this, but the main motivation for releasing this as a\nmini-package is to avoid making pure-Python packages into packages that\nrequire compilation just because of the need to compute fast histograms.\n\nCan I contribute?\n~~~~~~~~~~~~~~~~~\n\nYes please! This is not meant to be a finished package, and I welcome\npull request to improve things.\n\n.. |CI Status| image:: https://github.com/astrofrog/fast-histogram/actions/workflows/main.yml/badge.svg\n   :target: https://github.com/astrofrog/fast-histogram/actions/workflows/main.yml\n\n.. |asv| image:: https://img.shields.io/badge/benchmarked%20by-asv-brightgreen.svg\n   :target: https://astrofrog.github.io/fast-histogram\n\n.. |PyPI| image:: https://img.shields.io/pypi/v/fast-histogram.svg\n    :target: https://pypi.org/project/fast-histogram/\n    :alt: PyPI release\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fastrofrog%2Ffast-histogram","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fastrofrog%2Ffast-histogram","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fastrofrog%2Ffast-histogram/lists"}