https://github.com/targetta/ankaflow

YAML-based data pipeline framework that runs both locally and fully in-browser designed for data engineers, ML teams, and SaaS developers who need flexible, SQL-powered pipelines.
https://github.com/targetta/ankaflow

bigquery clickhouse data-analysis dataops deltalake duckdb elt-pipeline etl etl-automation motherduck parquet python sql

Last synced: 5 months ago
JSON representation

YAML-based data pipeline framework that runs both locally and fully in-browser designed for data engineers, ML teams, and SaaS developers who need flexible, SQL-powered pipelines.

Host: GitHub
URL: https://github.com/targetta/ankaflow
Owner: targetta
Created: 2025-05-04T22:00:10.000Z (10 months ago)
Default Branch: main
Last Pushed: 2025-09-21T22:16:05.000Z (5 months ago)
Last Synced: 2025-09-21T23:35:46.012Z (5 months ago)
Topics: bigquery, clickhouse, data-analysis, dataops, deltalake, duckdb, elt-pipeline, etl, etl-automation, motherduck, parquet, python, sql
Language: Python
Homepage: https://targetta.github.io/ankaflow/
Size: 2.2 MB
Stars: 1
Watchers: 1
Forks: 1
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # AnkaFlow

**Run your data pipelines in Python or the browser.**  

AnkaFlow is a YAML + SQL-powered data pipeline engine that works in local Python, JupyterLite, or fully in-browser via Pyodide.

## 🚀 Features

- Run pipelines using DuckDB with SQL and optional Python

- Supports Parquet, REST APIs, BigQuery, ClickHouse (server only)

- Browser-compatible: works in JupyterLite, GitHub Pages, VS Code Web and more

## 📦 Install

```bash

# Server

pip install ankaflow[server]

# Dev

pip install -e .[dev,server]

```

## 🛠 Usage

```bash

> ankaflow /path/to/stages.yaml

```

```python

from ankaflow import (

    ConnectionConfiguration,

    Stages,

    Flow,

)

connections = ConnectionConfiguration()

stages = Stages.load("path/to/stages.yaml")

flow = Flow(stages, connections)

flow.run()

```

## 🔁 What is `Stages`?

`Stages` is the object that holds your pipeline definition parsed from a YAML file.  

Each stage is one of: `tap`, `transform`, or `sink`.

### Example

```yaml

- name: Extract Data

  kind: tap

  connection:

    kind: Parquet

    locator: input.parquet

- name: Transform Data

  kind: transform

  query: SELECT * FROM "Extract Data" WHERE "amount" > 100

- name: Load Data

  kind: sink

  connection:

    kind: Parquet

    locator: output.parquet

```

## 📖 Documentation

- [All docs](https://targetta.github.io/ankaflow/)

- [Pipeline specification](https://targetta.github.io/ankaflow/api/ankaflow.models/)

- [Live demo](https://targetta.github.io/ankaflow/demo/)

---

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/targetta/ankaflow

Awesome Lists containing this project

README