https://github.com/targetta/ankaflow
YAML-based data pipeline framework that runs both locally and fully in-browser designed for data engineers, ML teams, and SaaS developers who need flexible, SQL-powered pipelines.
https://github.com/targetta/ankaflow
bigquery clickhouse data-analysis dataops deltalake duckdb elt-pipeline etl etl-automation motherduck parquet python sql
Last synced: 3 months ago
JSON representation
YAML-based data pipeline framework that runs both locally and fully in-browser designed for data engineers, ML teams, and SaaS developers who need flexible, SQL-powered pipelines.
- Host: GitHub
- URL: https://github.com/targetta/ankaflow
- Owner: targetta
- Created: 2025-05-04T22:00:10.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-09-21T22:16:05.000Z (4 months ago)
- Last Synced: 2025-09-21T23:35:46.012Z (4 months ago)
- Topics: bigquery, clickhouse, data-analysis, dataops, deltalake, duckdb, elt-pipeline, etl, etl-automation, motherduck, parquet, python, sql
- Language: Python
- Homepage: https://targetta.github.io/ankaflow/
- Size: 2.2 MB
- Stars: 1
- Watchers: 1
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# AnkaFlow
**Run your data pipelines in Python or the browser.**
AnkaFlow is a YAML + SQL-powered data pipeline engine that works in local Python, JupyterLite, or fully in-browser via Pyodide.
## 🚀 Features
- Run pipelines using DuckDB with SQL and optional Python
- Supports Parquet, REST APIs, BigQuery, ClickHouse (server only)
- Browser-compatible: works in JupyterLite, GitHub Pages, VS Code Web and more
## 📦 Install
```bash
# Server
pip install ankaflow[server]
# Dev
pip install -e .[dev,server]
```
## 🛠 Usage
```bash
> ankaflow /path/to/stages.yaml
```
```python
from ankaflow import (
ConnectionConfiguration,
Stages,
Flow,
)
connections = ConnectionConfiguration()
stages = Stages.load("path/to/stages.yaml")
flow = Flow(stages, connections)
flow.run()
```
## 🔁 What is `Stages`?
`Stages` is the object that holds your pipeline definition parsed from a YAML file.
Each stage is one of: `tap`, `transform`, or `sink`.
### Example
```yaml
- name: Extract Data
kind: tap
connection:
kind: Parquet
locator: input.parquet
- name: Transform Data
kind: transform
query: SELECT * FROM "Extract Data" WHERE "amount" > 100
- name: Load Data
kind: sink
connection:
kind: Parquet
locator: output.parquet
```
## 📖 Documentation
- [All docs](https://targetta.github.io/ankaflow/)
- [Pipeline specification](https://targetta.github.io/ankaflow/api/ankaflow.models/)
- [Live demo](https://targetta.github.io/ankaflow/demo/)
---