{"id":31097343,"url":"https://github.com/firelink-sh/evolve-py","last_synced_at":"2026-03-10T06:02:39.188Z","repository":{"id":310548244,"uuid":"1022279426","full_name":"firelink-sh/evolve-py","owner":"firelink-sh","description":"A highly efficient, composable, and lightweight ETL and data integration framework.","archived":false,"fork":false,"pushed_at":"2025-10-06T19:01:21.000Z","size":3432,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-06T20:41:02.205Z","etag":null,"topics":["analytics","arrow","big-data","data","data-engineering","data-integration","data-science","duckdb","elt","etl","ingestion","ingress","ml","olap","pipeline","polars","postgresql","python","s3"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/firelink-sh.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-07-18T19:06:14.000Z","updated_at":"2025-10-06T19:01:24.000Z","dependencies_parsed_at":"2025-08-18T20:41:50.266Z","dependency_job_id":"ecc7534e-d309-404a-a0c1-0116aecacc21","html_url":"https://github.com/firelink-sh/evolve-py","commit_stats":null,"previous_names":["firelink-sh/evolve-py"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/firelink-sh/evolve-py","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/firelink-sh%2Fevolve-py","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/firelink-sh%2Fevolve-py/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/firelink-sh%2Fevolve-py/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/firelink-sh%2Fevolve-py/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/firelink-sh","download_url":"https://codeload.github.com/firelink-sh/evolve-py/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/firelink-sh%2Fevolve-py/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30326878,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-10T05:25:20.737Z","status":"ssl_error","status_checked_at":"2026-03-10T05:25:17.430Z","response_time":106,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analytics","arrow","big-data","data","data-engineering","data-integration","data-science","duckdb","elt","etl","ingestion","ingress","ml","olap","pipeline","polars","postgresql","python","s3"],"created_at":"2025-09-16T19:03:16.114Z","updated_at":"2026-03-10T06:02:39.180Z","avatar_url":"https://github.com/firelink-sh.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n\u003cimg src=\"https://github.com/firelink-sh/evolve-py/blob/368bd3c6d1f520515a63b2f7b1340976a5c58b94/docs/assets/evolve-banner.png\" alt=\"evolve logo\" style=\"width:500px; height:auto\"\u003e\n\u003cp\u003e\n  \u003cem\u003eA highly efficient, composable, and lightweight ETL and data integration framework.\u003c/em\u003e\n\u003c/p\u003e\n\n\u003cbr\u003e\n\n[![CI](https://github.com/firelink-sh/evolve-py/actions/workflows/ci.yml/badge.svg)](https://github.com/firelink-sh/evolve-py/actions/workflows/ci.yml)\n[![Tests](https://github.com/firelink-sh/evolve-py/actions/workflows/tests.yml/badge.svg)](https://github.com/firelink-sh/evolve-py/actions/workflows/tests.yml)\n[![codecov](https://codecov.io/gh/firelink-sh/evolve-py/graph/badge.svg?token=OTFIM6UICZ)](https://codecov.io/gh/firelink-sh/evolve-py)\n\n\u003cbr\u003e\n\n\u003c/div\u003e\n\n\u003e evolve is currently in early development and consistently undergoes breaking changes\n\u003e to the core api and functionality. Expect a more stable version to be released in a couple\n\u003e of weeks.\n\nevolve is an **open-source** and **platform agnostic** Python framework that enables your data teams to **efficiently integrate data** from a wide variety of **structured** or **unstructured** data sources into your **database**, **data warehouse**, or **data lake(house)** — **blazingly fast** with **minimal memory overhead** thanks to the Apache Arrow ecosystem. \n\nIt is **built for developers** with a **code-first** mindset. You will not find any low-code, clickops, or drag-and-drop shenanigans here.\nevolve offers you full control of how your data is read, parsed, handled in-memory, transformed, and finally written to any destination you need.\n\n- **Composable** - Design your own data pipelines to fit into your own stack, and add any extra (possibly proprietary) sources or targets that you might possibly need, all possible through evolve's intuitive and lightweight framework philosophy.\n- **Blazing fast** - Zero-copy principles by leveraging Apache Arrow gives you extremely rapid in-memory operations perfect for OLAP and easy interoperability with DuckDB, Polars, Spark, DataFusion and many more query engines.\n- **Customizable** - You choose the backend that you want to use. Do you prefer DataFrames? Use Polars! Or perhaps you prefer to work on data using SQL? Then use the DuckDB backend! It is completely up to you.\n- **Platform agnostic** - Run your ETL/ELT using evolve on your own infrastructure, no vendor lock-in, never.\n\n\n## Architecture (alpha version)\n\n```mermaid\nflowchart TD\n    %% Sources (Connectors)\n    subgraph Sources\n        CSV[Local CSV Source]\n        JSON[HDFS JSON Source]\n        Parquet[S3 Parquet Source]\n        SQL[SQL Source]\n        Custom[Custom Source]\n    end\n\n    %% Intermediate Representation\n    subgraph Backend\n        Arrow[Apache Arrow / Polars / DuckDB / Custom]\n    end\n\n    %% Targets (Connectors)\n    subgraph Targets\n        S3[S3 object store]\n        Local[Local file system]\n        HDFS[Hadoop file system]\n        DW[Data Warehouse]\n        ML[ML Pipeline]\n        CustomOut[Custom Format]\n    end\n\n    %% Mapping logic\n    CSV --\u003e|Map to Arrow| Arrow\n    JSON --\u003e|Map to Arrow| Arrow\n    SQL --\u003e|Map to Arrow| Arrow\n    Custom --\u003e|Conditional Mapping| Arrow\n    Parquet --\u003e|Direct Mapping| S3\n\n    Arrow --\u003e S3\n    Arrow --\u003e Local\n    Arrow --\u003e HDFS\n    Arrow --\u003e DW\n    Arrow --\u003e ML\n    Arrow --\u003e Viz\n    Arrow --\u003e CustomOut\n```\n\n\n## Example usage\n\n```python\nimport evolve as ev\n\n# Pipelines are lazy - only run when told to\npipeline = ev.Pipeline(\"parquet-ingestion\") \\\n    .with_source(ev.io.FixedWidthFile(...)) \\\n    .with_target(ev.io.ParquetFile(...)) \\\n    .with_transform(DropNulls(columns=(..., )))\n\npipeline.run()  # runs the ETL\n```\n\nYou can configure it with yaml or json!\n\n```yml\nsource:\n  type: postgres\n  host: localhost\n  db: prod\n  user: admin\n  password: secret\n  schema: sales\n  tables: orders\n\ntransforms:\n  - type: drop_nulls\n    columns: [\"order_id\", \"amount\"]\n  - type: rename_columns\n    mapping:\n      order_id: id\n      amount: total\n  - type: filter_rows\n    condition: \"total \u003e 100\"\n\ntarget:\n  type: parquet\n  path: s3://prod/sales/orders.parquet\n```\n\n\n## License\n\nevolve is distributed under the terms of both the MIT License and the Apache License (version 2.0).\n\nSee LICENSE-APACHE and LICENSE-MIT for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffirelink-sh%2Fevolve-py","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffirelink-sh%2Fevolve-py","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffirelink-sh%2Fevolve-py/lists"}