https://github.com/8451/labrea
A framework for declarative, functional dataset definitions.
https://github.com/8451/labrea
Last synced: 12 months ago
JSON representation
A framework for declarative, functional dataset definitions.
- Host: GitHub
- URL: https://github.com/8451/labrea
- Owner: 8451
- License: mit
- Created: 2024-05-28T17:24:55.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-05-28T18:08:51.000Z (about 2 years ago)
- Last Synced: 2024-05-29T07:08:55.464Z (about 2 years ago)
- Language: Python
- Size: 2.85 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README

-----------------
# Labrea
A framework for declarative, functional dataset definitions.

[](https://www.tidyverse.org/lifecycle/#stable)
[](https://pypi.org/project/labrea/)
[](https://github.com/pre-commit/pre-commit)
[](https://github.com/8451/labrea/tree/meta/coverage)
[](https://8451.github.io/labrea)
## Installation
Labrea is available for install via pip.
```bash
pip install labrea
````
Alternatively, you can install the latest development version from GitHub.
```bash
pip install git+https://github.com/8451/labrea@develop
```
## Usage
See our usage guide [here](docs/source/usage.md).
Labrea exposes a `dataset` decorator that allows you to define datasets and their dependencies in a declarative manner.
Dependencies can either be other datasets or `Option`s, which are values that can be passed in at runtime via a
dictionary.
```python
from labrea import dataset, Option
import pandas as pd
@dataset
def stores(path: str = Option('PATHS.STORES')) -> pd.DataFrame:
return pd.read_csv(path)
@dataset
def transactions(path: str = Option('PATHS.SALES')) -> pd.DataFrame:
return pd.read_csv(path)
@dataset
def sales_by_region(
stores_: pd.DataFrame = stores,
transactions_: pd.DataFrame = transactions
) -> pd.DataFrame:
"""Merge stores to transactions, sum sales by region"""
return pd.merge(transactions_, stores_, on='store_id').groupby('region')['sales'].sum().reset_index()
options = {
'PATHS': {
'STORES': 'path/to/stores.csv',
'SALES': 'path/to/sales.csv'
}
}
stores(options)
## +-----------------+-----------+
## | store_id | region |
## |-----------------+-----------|
## | 1 | North |
## | 2 | North |
## | 3 | South |
## | 4 | South |
## +-----------------+-----------+
transactions(options)
## +-----------------+-----------------+-----------------+
## | store_id | sales | transaction_id |
## |-----------------+-----------------+-----------------|
## | 1 | 100 | 1 |
## | 2 | 200 | 2 |
## | 3 | 300 | 3 |
## | 4 | 400 | 4 |
## +-----------------+-----------------+-----------------+
sales_by_region(options)
## +-----------------+-----------------+
## | region | sales |
## |-----------------+-----------------|
## | North | 300 |
## | South | 700 |
## +-----------------+-----------------+
```
## Contributing
If you would like to contribute to **labrea**, please read the
[Contributing Guide](docs/source/contributing.md).
## Changelog
A summary of recent updates to **labrea** can be found in the
[Changelog](docs/source/changelog.md).
## Maintainers
| Maintainer | Email |
|-----------------------------------------------------------|--------------------------|
| [Austin Warner](https://github.com/austinwarner-8451) | austin.warner@8451.com |
| [Michael Stoepel](https://github.com/michaelstoepel-8451) | michael.stoepel@8451.com |
## Links
- Report a bug or request a feature: https://github.com/8451/labrea/issues/new/choose
- Documentation: https://8451.github.io/labrea