Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/scikit-hep/awkward
Manipulate JSON-like data with NumPy-like idioms.
https://github.com/scikit-hep/awkward
apache-arrow cern-root columnar-format data-analysis jagged-array json numba numpy pandas python ragged-array rdataframe scikit-hep
Last synced: 5 days ago
JSON representation
Manipulate JSON-like data with NumPy-like idioms.
- Host: GitHub
- URL: https://github.com/scikit-hep/awkward
- Owner: scikit-hep
- License: bsd-3-clause
- Created: 2019-08-14T19:32:12.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2024-10-28T21:48:10.000Z (3 months ago)
- Last Synced: 2024-10-29T15:19:19.372Z (3 months ago)
- Topics: apache-arrow, cern-root, columnar-format, data-analysis, jagged-array, json, numba, numpy, pandas, python, ragged-array, rdataframe, scikit-hep
- Language: Python
- Homepage: https://awkward-array.org
- Size: 25.3 MB
- Stars: 835
- Watchers: 22
- Forks: 87
- Open Issues: 130
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
README
![](docs-img/logo/logo-300px.png)
[![PyPI version](https://badge.fury.io/py/awkward.svg)](https://pypi.org/project/awkward)
[![Conda-Forge](https://img.shields.io/conda/vn/conda-forge/awkward)](https://github.com/conda-forge/awkward-feedstock)
[![Python 3.9β3.13](https://img.shields.io/badge/python-3.9%E2%80%923.13-blue)](https://www.python.org)
[![BSD-3 Clause License](https://img.shields.io/badge/license-BSD%203--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)
[![Build Test](https://github.com/scikit-hep/awkward/actions/workflows/test.yml/badge.svg?branch=main)](https://github.com/scikit-hep/awkward/actions/workflows/test.yml)[![Scikit-HEP](https://scikit-hep.org/assets/images/Scikit--HEP-Project-blue.svg)](https://scikit-hep.org/)
[![NSF-1836650](https://img.shields.io/badge/NSF-1836650-blue.svg)](https://nsf.gov/awardsearch/showAward?AWD_ID=1836650)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4341376.svg)](https://doi.org/10.5281/zenodo.4341376)
[![Documentation](https://img.shields.io/badge/docs-online-success)](https://awkward-array.org/)
[![Gitter](https://img.shields.io/badge/chat-online-success)](https://gitter.im/Scikit-HEP/awkward-array)Awkward Array is a library for **nested, variable-sized data**, including arbitrary-length lists, records, mixed types, and missing data, using **NumPy-like idioms**.
Arrays are **dynamically typed**, but operations on them are **compiled and fast**. Their behavior coincides with NumPy when array dimensions are regular and generalizes when they're not.
# Motivating example
Given an array of lists of objects with `x`, `y` fields (with nested lists in the `y` field),
```python
import awkward as akarray = ak.Array([
[{"x": 1.1, "y": [1]}, {"x": 2.2, "y": [1, 2]}, {"x": 3.3, "y": [1, 2, 3]}],
[],
[{"x": 4.4, "y": [1, 2, 3, 4]}, {"x": 5.5, "y": [1, 2, 3, 4, 5]}]
])
```the following slices out the `y` values, drops the first element from each inner list, and runs NumPy's `np.square` function on everything that is left:
```python
output = np.square(array["y", ..., 1:])
```The result is
```python
[
[[], [4], [4, 9]],
[],
[[4, 9, 16], [4, 9, 16, 25]]
]
```The equivalent using only Python is
```python
output = []
for sublist in array:
tmp1 = []
for record in sublist:
tmp2 = []
for number in record["y"][1:]:
tmp2.append(np.square(number))
tmp1.append(tmp2)
output.append(tmp1)
```The expression using Awkward Arrays is more concise, using idioms familiar from NumPy, and it also has NumPy-like performance. For a similar problem 10 million times larger than the one above (single-threaded on a 2.2 GHz processor),
* the Awkward Array one-liner takes **1.5 seconds** to run and uses **2.1 GB** of memory,
* the equivalent using Python lists and dicts takes **140 seconds** to run and uses **22 GB** of memory.Awkward Array is even faster when used in [Numba](https://numba.pydata.org/)'s JIT-compiled functions.
See the [Getting started](https://awkward-array.org/doc/main/getting-started/index.html) documentation on [awkward-array.org](https://awkward-array.org) for an introduction, including a [no-install demo](https://awkward-array.org/doc/main/getting-started/try-awkward-array.html) you can try in your web browser.
# Getting help
* View the documentation on [awkward-array.org](https://awkward-array.org/).
* Report bugs, request features, and ask for additional documentation on [GitHub Issues](https://github.com/scikit-hep/awkward/issues).
* If you have a "How do I...?" question, start a [GitHub Discussion](https://github.com/scikit-hep/awkward/discussions) with category "Q&A".
* Alternatively, ask about it on [StackOverflow with the [awkward-array] tag](https://stackoverflow.com/questions/tagged/awkward-array). Be sure to include tags for any other libraries that you use, such as Pandas or PyTorch.
* To ask questions in real time, try the Gitter [Scikit-HEP/awkward-array](https://gitter.im/Scikit-HEP/awkward-array) chat room.# Installation
Awkward Array can be installed from [PyPI](https://pypi.org/project/awkward) using pip:
```bash
pip install awkward
```The `awkward` package is pure Python, and it will download the `awkward-cpp` compiled components as a dependency. If there is no `awkward-cpp` binary package (wheel) for your platform and Python version, pip will attempt to compile it from source (which has additional dependencies, such as a C++ compiler).
Awkward Array is also available on [conda-forge](https://conda-forge.org/docs/user/introduction.html#how-can-i-install-packages-from-conda-forge):
```bash
conda install -c conda-forge awkward
```Because of the two packages (`awkward-cpp` may be updated in GitHub but not on PyPI), pip install through git (`pip install git+https://...`) will not work. Instead, use the [Installation for developers](#installation-for-developers) section below.
# Installation for developers
Clone this repository _recursively_ to get the header-only C++ dependencies, then generate sources with [nox](https://nox.thea.codes/), compile and install `awkward-cpp`, and finally install `awkward` as an editable installation:
```bash
git clone --recursive https://github.com/scikit-hep/awkward.git
cd awkwardnox -s prepare
python -m pip install -v ./awkward-cpp
python -m pip install -e .
```Tests can be run in parallel with [pytest](https://docs.pytest.org/):
```bash
python -m pytest -n auto tests
```For more details, see [CONTRIBUTING.md](https://github.com/scikit-hep/awkward/blob/main/CONTRIBUTING.md), or one of the links below.
* [Continuous integration](https://github.com/scikit-hep/awkward/actions/workflows/test.yml) and [continuous deployment](https://github.com/scikit-hep/awkward/actions/workflows/wheels.yml) are hosted by [GitHub Actions](https://github.com/features/actions/).
* [Code of conduct](https://scikit-hep.org/code-of-conduct) for how we work together.
* The [LICENSE](LICENSE) is BSD-3.# Documentation, Release notes, Roadmap, Citations
The documentation is on [awkward-array.org](https://awkward-array.org), including
* [Getting started](https://awkward-array.org/doc/main/getting-started/index.html)
* [User guide](https://awkward-array.org/doc/main/user-guide/index.html)
* [API reference](https://awkward-array.org/doc/main/reference/index.html)
* [Tutorials (with videos)](https://awkward-array.org/doc/main/getting-started/community-tutorials.html)
* [Papers and talks](https://awkward-array.org/doc/main/getting-started/papers-and-talks.html) about Awkward ArrayThe Release notes for each version are in the [GitHub Releases tab](https://github.com/scikit-hep/awkward/releases).
The Roadmap, Plans, and Deprecation Schedule are in the [GitHub Wiki](https://github.com/scikit-hep/awkward/wiki).
To cite Awkward Array in a paper, see the "Cite this repository" drop-down menu on the top-right of the [GitHub front page](https://github.com/scikit-hep/awkward). The BibTeX is
```bibtex
@software{Pivarski_Awkward_Array_2018,
author = {Pivarski, Jim and Osborne, Ianna and Ifrim, Ioana and Schreiner, Henry and Hollands, Angus and Biswas, Anish and Das, Pratyush and Roy Choudhury, Santam and Smith, Nicholas and Goyal, Manasvi},
doi = {10.5281/zenodo.4341376},
month = {10},
title = {{Awkward Array}},
year = {2018}
}
```# Acknowledgements
Support for this work was provided by NSF cooperative agreement [OAC-1836650](https://www.nsf.gov/awardsearch/showAward?AWD_ID=1836650) (IRIS-HEP 1), [PHY-2323298](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2323298) (IRIS-HEP 2), grant [OAC-1450377](https://nsf.gov/awardsearch/showAward?AWD_ID=1450377) (DIANA/HEP), [PHY-1520942](https://www.nsf.gov/awardsearch/showAward?AWD_ID=1520942) and [PHY-2121686](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2121686) (US-CMS LHC Ops), and [OAC-2103945](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2103945) (Awkward Array).
We also thank [Erez Shinan](https://github.com/erezsh) and the developers of the [Lark standalone parser](https://github.com/lark-parser/lark), which is used to parse type strings as type objects.
Thanks especially to the gracious help of Awkward Array contributors (including the [original repository](https://github.com/scikit-hep/awkward-0.x)).
Jim Pivarski
π» π π π§
Ianna Osborne
π»
Pratyush Das
π»
Anish Biswas
π»
glass-ships
π» β οΈ
Henry Schreiner
π» π
Nicholas Smith
π» β οΈ
Lindsey Gray
π» β οΈ
Ellipse0934
β οΈ
Dmitry Kalinkin
π
Charles Escott
π»
Mason Proffitt
π»
Michael Hedges
π»
Jonas Rembser
π»
Jaydeep Nandi
π»
benkrikler
π»
bfis
π»
Doug Davis
π»
Joosep Pata
π€
Martin Durant
π€
Gordon Watts
π€
Nikolai Hartmann
π»
Simon Perkins
π»
.hard
π» β οΈ
HenryDayHall
π»
Angus Hollands
β οΈ π»
ioanaif
π» β οΈ
Bernhard M. Wiedemann
π§
Matthew Feickert
π§
Santam Roy Choudhury
β οΈ
Jeroen Van Goey
π
Ahmad-AlSubaie
π»
Manasvi Goyal
π»
Aryan Roy
π»
Saransh
π»
Laurits Tani
π
Daniel Savoiu
π»
Ray Bell
π
Andrea Zonca
π»
Chris Burr
π
ZoΓ« Bilodeau
π»
Raymond Ehlers
π§
Markus LΓΆning
π
Kush Kothari
π» β οΈ
Jonas RΓΌbenach
π»
Jerry Ling
π
Luis Antonio Obis Aparicio
π»
Topher Cawlfield
π»
Massimiliano Galli
π»
Peter Fackeldey
π»
Andres Rios Tascon
π»
maxymnaumchyk
π»
Thomas A Caswell
π§
Bas Nijholt
π§
Igor Vaiman
π»
π»: code, π: documentation, π: infrastructure, π§: maintenance, β : tests and feedback, π€: foundational ideas.