{"id":31737467,"url":"https://github.com/targetta/ankaflow","last_synced_at":"2025-10-09T09:18:53.226Z","repository":{"id":291559179,"uuid":"977747777","full_name":"targetta/ankaflow","owner":"targetta","description":"YAML-based data pipeline framework that runs both locally and fully in-browser designed for data engineers, ML teams, and SaaS developers who need flexible, SQL-powered pipelines.","archived":false,"fork":false,"pushed_at":"2025-09-21T22:16:05.000Z","size":2312,"stargazers_count":1,"open_issues_count":1,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-21T23:35:46.012Z","etag":null,"topics":["bigquery","clickhouse","data-analysis","dataops","deltalake","duckdb","elt-pipeline","etl","etl-automation","motherduck","parquet","python","sql"],"latest_commit_sha":null,"homepage":"https://targetta.github.io/ankaflow/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/targetta.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-05-04T22:00:10.000Z","updated_at":"2025-09-21T22:16:08.000Z","dependencies_parsed_at":"2025-05-05T11:29:26.464Z","dependency_job_id":"084fc44a-20f3-4784-831a-519b661e4b2f","html_url":"https://github.com/targetta/ankaflow","commit_stats":null,"previous_names":["mudam/ankaflow","targetta/ankaflow"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/targetta/ankaflow","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/targetta%2Fankaflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/targetta%2Fankaflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/targetta%2Fankaflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/targetta%2Fankaflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/targetta","download_url":"https://codeload.github.com/targetta/ankaflow/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/targetta%2Fankaflow/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279001122,"owners_count":26083021,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-09T02:00:07.460Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bigquery","clickhouse","data-analysis","dataops","deltalake","duckdb","elt-pipeline","etl","etl-automation","motherduck","parquet","python","sql"],"created_at":"2025-10-09T09:18:51.442Z","updated_at":"2025-10-09T09:18:53.211Z","avatar_url":"https://github.com/targetta.png","language":"Python","readme":"# AnkaFlow\n\n**Run your data pipelines in Python or the browser.**  \nAnkaFlow is a YAML + SQL-powered data pipeline engine that works in local Python, JupyterLite, or fully in-browser via Pyodide.\n\n## 🚀 Features\n\n- Run pipelines using DuckDB with SQL and optional Python\n- Supports Parquet, REST APIs, BigQuery, ClickHouse (server only)\n- Browser-compatible: works in JupyterLite, GitHub Pages, VS Code Web and more\n\n## 📦 Install\n\n```bash\n# Server\npip install ankaflow[server]\n\n# Dev\npip install -e .[dev,server]\n```\n\n## 🛠 Usage\n\n```bash\n\n\u003e ankaflow /path/to/stages.yaml\n```\n\n```python\nfrom ankaflow import (\n    ConnectionConfiguration,\n    Stages,\n    Flow,\n)\n\nconnections = ConnectionConfiguration()\n\nstages = Stages.load(\"path/to/stages.yaml\")\nflow = Flow(stages, connections)\nflow.run()\n```\n\n## 🔁 What is `Stages`?\n\n`Stages` is the object that holds your pipeline definition parsed from a YAML file.  \nEach stage is one of: `tap`, `transform`, or `sink`.\n\n### Example\n\n```yaml\n- name: Extract Data\n  kind: tap\n  connection:\n    kind: Parquet\n    locator: input.parquet\n\n- name: Transform Data\n  kind: transform\n  query: SELECT * FROM \"Extract Data\" WHERE \"amount\" \u003e 100\n\n- name: Load Data\n  kind: sink\n  connection:\n    kind: Parquet\n    locator: output.parquet\n```\n\n## 📖 Documentation\n\n- [All docs](https://targetta.github.io/ankaflow/)\n- [Pipeline specification](https://targetta.github.io/ankaflow/api/ankaflow.models/)\n- [Live demo](https://targetta.github.io/ankaflow/demo/)\n\n---","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftargetta%2Fankaflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftargetta%2Fankaflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftargetta%2Fankaflow/lists"}