Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/program--/pyper
ETL toolkit based on DuckDB and PRQL
https://github.com/program--/pyper
duckdb etl prql python
Last synced: 24 days ago
JSON representation
ETL toolkit based on DuckDB and PRQL
- Host: GitHub
- URL: https://github.com/program--/pyper
- Owner: program--
- License: mit
- Created: 2023-03-19T04:48:44.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-03-19T04:49:08.000Z (almost 2 years ago)
- Last Synced: 2024-06-11T17:05:01.562Z (7 months ago)
- Topics: duckdb, etl, prql, python
- Language: Python
- Homepage:
- Size: 2.93 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Pyper
An (experimental) ETL toolkit based on [DuckDB](https://duckdb.org/) and [PRQL](https://prql-lang.org/).
## Usage
Pyper relies on YAML to describe workflows. We use pydantic to model how a workflow file should look. Here's an example that showcases a simple workflow:
```yaml
# myworkflow.yaml
extract:
provider: local
uri: file:///mnt/ssd/projects/pyper/invoices.csv
register: my_data_sourcetransform:
lang: prql
backend: duckdb
query: |
from my_data_source
filter billing_country == "USA"
group [customer_id] (
aggregate [
sum total,
count,
]
)load:
provider: local
uri: file:///mnt/ssd/projects/pyper/invoices_usa.csv
```Then, using Python:
```python
import pyper
pyper.workflow('myworkflow.yaml').exec()
```