{"id":13688700,"url":"https://github.com/xarray-contrib/flox","last_synced_at":"2025-12-12T00:42:35.796Z","repository":{"id":38379222,"uuid":"351138593","full_name":"xarray-contrib/flox","owner":"xarray-contrib","description":"Fast \u0026 furious GroupBy operations for dask.array","archived":false,"fork":false,"pushed_at":"2025-04-08T16:12:19.000Z","size":1855,"stargazers_count":129,"open_issues_count":39,"forks_count":18,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-04-08T17:26:00.453Z","etag":null,"topics":["dask","map-reduce","xarray"],"latest_commit_sha":null,"homepage":"https://flox.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/xarray-contrib.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-03-24T15:52:40.000Z","updated_at":"2025-04-08T16:12:23.000Z","dependencies_parsed_at":"2023-02-18T12:19:48.887Z","dependency_job_id":"43713fb7-6dc8-49f0-8004-5b6fe25d4f11","html_url":"https://github.com/xarray-contrib/flox","commit_stats":{"total_commits":556,"total_committers":13,"mean_commits":42.76923076923077,"dds":0.3633093525179856,"last_synced_commit":"561378d2172d73bdec8611f558ddb20d12eeaeac"},"previous_names":[],"tags_count":65,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xarray-contrib%2Fflox","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xarray-contrib%2Fflox/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xarray-contrib%2Fflox/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xarray-contrib%2Fflox/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/xarray-contrib","download_url":"https://codeload.github.com/xarray-contrib/flox/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247894424,"owners_count":21014098,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dask","map-reduce","xarray"],"created_at":"2024-08-02T15:01:20.546Z","updated_at":"2025-12-12T00:42:35.769Z","avatar_url":"https://github.com/xarray-contrib.png","language":"Python","readme":"[![GitHub Workflow CI Status](https://img.shields.io/github/actions/workflow/status/xarray-contrib/flox/ci.yaml?branch=main\u0026logo=github\u0026style=flat)](https://github.com/xarray-contrib/flox/actions)\n[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/xarray-contrib/flox/main.svg)](https://results.pre-commit.ci/latest/github/xarray-contrib/flox/main)\n[![image](https://img.shields.io/codecov/c/github/xarray-contrib/flox.svg?style=flat)](https://codecov.io/gh/xarray-contrib/flox)\n[![Documentation Status](https://readthedocs.org/projects/flox/badge/?version=latest)](https://flox.readthedocs.io/en/latest/?badge=latest)\n\n[![PyPI](https://img.shields.io/pypi/v/flox.svg?style=flat)](https://pypi.org/project/flox/)\n[![Conda-forge](https://img.shields.io/conda/vn/conda-forge/flox.svg?style=flat)](https://anaconda.org/conda-forge/flox)\n\n[![NASA-80NSSC18M0156](https://img.shields.io/badge/NASA-80NSSC18M0156-blue)](https://earthdata.nasa.gov/esds/competitive-programs/access/pangeo-ml)\n[![NASA-80NSSC22K0345](https://img.shields.io/badge/NASA-80NSSC22K0345-blue)](https://science.nasa.gov/open-science-overview)\n\n# flox\n\nThis project explores strategies for fast GroupBy reductions with dask.array. It used to be called `dask_groupby`\nIt was motivated by\n\n1. Dask Dataframe GroupBy\n   [blogpost](https://blog.dask.org/2019/10/08/df-groupby)\n1. [numpy_groupies](https://github.com/ml31415/numpy-groupies) in Xarray\n   [issue](https://github.com/pydata/xarray/issues/4473)\n\n(See a\n[presentation](https://docs.google.com/presentation/d/1YubKrwu9zPHC_CzVBhvORuQBW-z148BvX3Ne8XcvWsQ/edit?usp=sharing)\nabout this package, from the Pangeo Showcase).\n\n## Acknowledgements\n\nThis work was funded in part by\n\n1. NASA-ACCESS 80NSSC18M0156 \"Community tools for analysis of NASA Earth Observing System\n   Data in the Cloud\" (PI J. Hamman, NCAR),\n1. NASA-OSTFL 80NSSC22K0345 \"Enhancing analysis of NASA data with the open-source Python Xarray Library\" (PIs Scott Henderson, University of Washington; Deepak Cherian, NCAR; Jessica Scheick, University of New Hampshire), and\n1. [NCAR's Earth System Data Science Initiative](https://ncar.github.io/esds/).\n\nIt was motivated by [very](https://github.com/pangeo-data/pangeo/issues/266) [very](https://github.com/pangeo-data/pangeo/issues/271) [many](https://github.com/dask/distributed/issues/2602) [discussions](https://github.com/pydata/xarray/issues/2237) in the [Pangeo](https://pangeo.io) community.\n\n## API\n\nThere are two main functions\n\n1. `flox.groupby_reduce(dask_array, by_dask_array, \"mean\")`\n   \"pure\" dask array interface\n1. `flox.xarray.xarray_reduce(xarray_object, by_dataarray, \"mean\")`\n   \"pure\" xarray interface; though [work is ongoing](https://github.com/pydata/xarray/pull/5734) to integrate this\n   package in xarray.\n\n## Implementation\n\nSee [the documentation](https://flox.readthedocs.io/en/latest/implementation.html) for details on the implementation.\n\n## Custom reductions\n\n`flox` implements all common reductions provided by `numpy_groupies` in `aggregations.py`.\nIt also allows you to specify a custom Aggregation (again inspired by dask.dataframe),\nthough this might not be fully functional at the moment. See `aggregations.py` for examples.\n\n```python\nmean = Aggregation(\n    # name used for dask tasks\n    name=\"mean\",\n    # operation to use for pure-numpy inputs\n    numpy=\"mean\",\n    # blockwise reduction\n    chunk=(\"sum\", \"count\"),\n    # combine intermediate results: sum the sums, sum the counts\n    combine=(\"sum\", \"sum\"),\n    # generate final result as sum / count\n    finalize=lambda sum_, count: sum_ / count,\n    # Used when \"reindexing\" at combine-time\n    fill_value=0,\n    # Used when any member of `expected_groups` is not found\n    final_fill_value=np.nan,\n)\n```\n","funding_links":[],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxarray-contrib%2Fflox","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxarray-contrib%2Fflox","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxarray-contrib%2Fflox/lists"}