https://github.com/artigraph/artigraph
Batteries included toolkit for data engineering.
https://github.com/artigraph/artigraph
data python
Last synced: 20 days ago
JSON representation
Batteries included toolkit for data engineering.
- Host: GitHub
- URL: https://github.com/artigraph/artigraph
- Owner: artigraph
- License: apache-2.0
- Created: 2020-09-04T05:57:36.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2024-12-30T13:54:07.000Z (about 1 year ago)
- Last Synced: 2025-12-22T10:04:42.613Z (about 1 month ago)
- Topics: data, python
- Language: Python
- Homepage:
- Size: 4.78 MB
- Stars: 36
- Watchers: 6
- Forks: 8
- Open Issues: 23
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Codeowners: CODEOWNERS
- Security: SECURITY.md
- Support: SUPPORT.md
- Governance: GOVERNANCE.md
Awesome Lists containing this project
README
# artigraph
[](https://pypi.python.org/pypi/arti)
[](https://github.com/artigraph/artigraph/releases)
[](https://pepy.tech/project/arti)
[](https://github.com/artigraph/artigraph)
[](https://github.com/artigraph/artigraph/blob/main/LICENSE)
[](https://github.com/artigraph/artigraph/actions/workflows/ci.yaml)
[](https://codecov.io/gh/artigraph/artigraph)
[](https://bestpractices.coreinfrastructure.org/projects/5561)
Declarative Data Production
Artigraph is a tool to improve the authorship, management, and quality of data. It emphasizes that the core deliverable of a data pipeline or workflow is the data, not the tasks.
Artigraph is hosted by the [LF AI and Data Foundation](https://lfaidata.foundation) as a Sandbox project. See our [deck](https://docs.google.com/presentation/d/1KLM9r0L5sTbpb_UPR5nx4fil-7fO-UnmhTeatSiaN3Y) or [presentation](https://wiki.lfaidata.foundation/download/attachments/7733341/GMT20220127-140219_Recording_3840x2160.mp4?version=1&modificationDate=1643716019000&api=v2) (@6m35s) requesting Sandbox incubation.
## Community
We're excited to hear from anyone interested in the project - feel free to introduce yourself over in the [Intro Discussions](https://github.com/artigraph/artigraph/discussions/categories/intros)! See our [support page](SUPPORT.md) for help or our [contributing page](CONTRIBUTING.md) for guidelines.
## Installation
Artigraph can be installed from PyPI with `pip install arti`.
## Example
This sample from the [spend example](docs/examples/spend/demo.py) highlights computing the total amount spent from a series of purchase transactions:
```python
from pathlib import Path
from typing import Annotated
from arti import Annotation, Artifact, Graph, producer
from arti.formats.json import JSON
from arti.storage.local import LocalFile
from arti.types import Collection, Date, Float64, Int64, Struct
from arti.versions import SemVer
DIR = Path(__file__).parent
class Vendor(Annotation):
name: str
class Transactions(Artifact):
"""Transactions partitioned by day."""
type = Collection(
element=Struct(fields={"id": Int64(), "date": Date(), "amount": Float64()}),
partition_by=("date",),
)
class TotalSpend(Artifact):
"""Aggregate spend over all time."""
type = Float64()
format = JSON()
storage = LocalFile()
@producer(version=SemVer(major=1, minor=0, patch=0))
def aggregate_transactions(
transactions: Annotated[list[dict], Transactions]
) -> Annotated[float, TotalSpend]:
return sum(txn["amount"] for txn in transactions)
with Graph(name="test-graph") as g:
g.artifacts.vendor.transactions = Transactions(
annotations=[Vendor(name="Acme")],
format=JSON(),
storage=LocalFile(path=str(DIR / "transactions" / "{date.iso}.json")),
)
g.artifacts.spend = aggregate_transactions(
transactions=g.artifacts.vendor.transactions
)
```
The full example can be run easily with `docker run --rm artigraph/example-spend`:
```
INFO:root:Writing mock Transactions data:
INFO:root: /usr/src/app/transactions/2021-10-01.json: [{'id': 1, 'amount': 9.95}, {'id': 2, 'amount': 7.5}]
INFO:root: /usr/src/app/transactions/2021-10-02.json: [{'id': 3, 'amount': 5.0}, {'id': 4, 'amount': 12.0}, {'id': 4, 'amount': 7.55}]
INFO:root:Building aggregate_transactions(transactions=Transactions(format=JSON(), storage=LocalFile(path='/usr/src/app/transactions/{date.iso}.json'), annotations=(Vendor(name='Acme'),)))...
INFO:root:Build finished.
INFO:root:Final Spend data:
INFO:root: /tmp/test-graph/spend/7564053533177891797/spend.json: 42.0
```