https://github.com/lincc-frameworks/nested-pandas
Efficient Pandas representation for nested associated datasets.
https://github.com/lincc-frameworks/nested-pandas
Last synced: 5 months ago
JSON representation
Efficient Pandas representation for nested associated datasets.
- Host: GitHub
- URL: https://github.com/lincc-frameworks/nested-pandas
- Owner: lincc-frameworks
- License: mit
- Created: 2024-04-01T20:08:32.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2026-01-12T20:46:11.000Z (6 months ago)
- Last Synced: 2026-01-12T21:29:13.413Z (6 months ago)
- Language: Python
- Homepage: https://nested-pandas.readthedocs.io
- Size: 17.9 MB
- Stars: 18
- Watchers: 4
- Forks: 1
- Open Issues: 49
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# nested-pandas
[](https://lincc-ppt.readthedocs.io/en/latest/)
[](https://pypi.org/project/nested-pandas/)
[](https://anaconda.org/conda-forge/nested-pandas)
[](https://github.com/lincc-frameworks/nested-pandas/actions/workflows/smoke-test.yml)
[](https://codecov.io/gh/lincc-frameworks/nested-pandas)
[](https://nested-pandas.readthedocs.io/)
[](https://lincc-frameworks.github.io/nested-pandas/)
An extension of pandas for efficient representation of nested
associated datasets.
Nested-Pandas extends the [pandas](https://pandas.pydata.org/) package with
tooling and support for nested dataframes packed into values of top-level
dataframe columns. [Pyarrow](https://arrow.apache.org/docs/python/index.html)
is used internally to aid in scalability and performance.
Nested-Pandas allows data like this:
To instead be represented like this:
Where the nested data is represented as nested dataframes:
```python
# Each row of "object_nf" now has it's own sub-dataframe of matched rows from "source_df"
object_nf.loc[0]["nested_sources"]
```
Allowing powerful and straightforward operations, like:
```python
# Compute the mean flux for each row of "object_nf"
import numpy as np
def mean_flux(row):
"""Calculates the mean flux for each object"""
return np.mean(row["nested_sources.flux"])
object_nf.map_rows(mean_flux, output_names="mean_flux")
```
Nested-Pandas is motivated by time-domain astronomy use cases, where we see
typically two levels of information, information about astronomical objects and
then an associated set of `N` measurements of those objects. Nested-Pandas offers
a performant and memory-efficient package for working with these types of datasets.
Core advantages being:
* hierarchical column access
* efficient packing of nested information into inputs to custom user functions
* avoiding costly groupby operations
This is a LINCC Frameworks project - find more information about LINCC Frameworks [here](https://lsstdiscoveryalliance.org/programs/lincc-frameworks/).
## Acknowledgements
This project is supported by Schmidt Sciences.