Projects in Awesome Lists tagged with elt
A curated list of projects in awesome lists tagged with elt .
https://github.com/apache/airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
airflow apache apache-airflow automation dag data-engineering data-integration data-orchestrator data-pipelines data-science elt etl machine-learning mlops orchestration python scheduler workflow workflow-engine workflow-orchestration
Last synced: 09 Sep 2025
https://github.com/airbytehq/airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
bigquery change-data-capture data data-analysis data-collection data-engineering data-integration data-pipeline elt etl java mssql mysql pipeline postgresql python redshift s3 self-hosted snowflake
Last synced: 09 Sep 2025
https://github.com/dbt-labs/dbt-core
dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
analytics business-intelligence data-modeling dbt-viewpoint elt pypa slack
Last synced: 07 Jan 2026
https://github.com/apache/seatunnel
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
apache batch cdc change-data-capture data-ingestion data-integration elt high-performance offline real-time streaming
Last synced: 12 May 2025
https://github.com/mage-ai/mage-ai
🧙 Build, run, and manage data pipelines for integrating and transforming data.
artificial-intelligence data data-engineering data-integration data-pipelines data-science dbt elt etl machine-learning orchestration pipeline pipelines python reverse-etl spark sql transformation
Last synced: 13 May 2025
https://github.com/cloudquery/cloudquery
The developer first cloud governance platform
airbyte attack-surface-management aws azure bigquery cspm data data-analysis data-collection data-engineering data-integration elt etl etl-framework gcp github-api go google kubernetes sql
Last synced: 14 May 2025
https://github.com/apache/flink-cdc
Flink CDC is a streaming data integration tool
batch cdc change-data-capture data-integration data-pipeline distributed elt etl flink kafka mysql paimon postgresql real-time schema-evolution
Last synced: 12 May 2025
https://github.com/rudderlabs/rudder-server
Privacy and Security focused Segment-alternative, in Golang and React
bigquery cdp customer-data customer-data-lake customer-data-pipeline customer-data-platform data-engineering data-integration data-pipeline data-synchronization data-warehouse elt etl event-streaming privacy redshift segment-alternative snowflake warehouse-management warehouse-native
Last synced: 13 May 2025
https://github.com/dlt-hub/dlt
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
data data-engineering data-lake data-loading data-warehouse elt extract load python transform
Last synced: 26 Mar 2025
https://github.com/tobikodata/sqlmesh
Scalable and efficient data transformation framework - backwards compatible with dbt.
dataengineering dataops dbt elt etl python sql transformation
Last synced: 21 Oct 2025
https://github.com/TobikoData/sqlmesh
Efficient data transformation and modeling framework that is backwards compatible with dbt.
dataengineering dataops dbt elt etl python sql transformation
Last synced: 26 Mar 2025
https://github.com/quarylabs/quary
Open-source BI for engineers
analytics big-data business-intelligence data-modeling elt
Last synced: 26 Mar 2025
https://github.com/meltano/meltano
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
connectors data data-engineering data-pipelines dataops dataops-platform elt extract-data integration loaders meltano meltano-sdk open-source opensource pipelines singer tap taps target targets
Last synced: 12 May 2025
https://github.com/ucbepic/docetl
A system for agentic LLM-powered data processing and ETL
agents data data-pipelines elt etl llm python workflow
Last synced: 12 Oct 2025
https://github.com/dataform-co/dataform
Dataform is a framework for managing SQL based data operations in BigQuery
analytics business-intelligence data-engineering data-pipelines elt etl hacktoberfest
Last synced: 13 May 2025
https://github.com/kuwala-io/kuwala
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times
admin-boundaries data data-integration data-science dbt elt google-trends jupyter kuwala no-code open-data open-source population postgres pyspark python react react-flow scraping spatial-analysis
Last synced: 30 Mar 2025
https://github.com/raystack/optimus
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
airflow analytics analytics-engineering automation bigquery business-intelligence data-modelling data-pipelines data-transformation data-warehouse dataops elt etl golang workflows
Last synced: 16 May 2025
https://github.com/artie-labs/transfer
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift, Databricks) in real-time.
apache-kafka bigquery cdc change-data-capture data-integration data-pipelines database debezium elt golang kafka redshift snowflake
Last synced: 28 Dec 2025
https://github.com/gouline/dbt-metabase
dbt + Metabase integration
analytics business-intelligence data data-modelling dbt elt metabase pypa python vizualisation
Last synced: 14 May 2025
https://github.com/Datavault-UK/automate-dv
A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
data-vault dataengineering datalake datavault datavault20 datawarehouse datawarehousing dbt elt etl metadata snowflake sql
Last synced: 13 May 2025
https://github.com/vmware/versatile-data-kit
One framework to develop, deploy and operate data workflows with Python and SQL.
analytics data data-engineer data-engineering data-engineering-pipeline data-lineage data-pipelines data-science data-structures data-warehouse database dataops elt etl pipeline python snowflake sql trino warehouse
Last synced: 15 May 2025
https://github.com/xorq-labs/xorq
multi-engine batch transformation framework
arrow dataframe elt machine-learning multi-engine python sklearn sql
Last synced: 05 Oct 2025
https://github.com/astronomer/astro-sdk
Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
airflow apache-airflow bigquery dags data-analysis data-science elt etl gcs pandas postgres python s3 snowflake sql sqlite workflows
Last synced: 13 Apr 2025
https://github.com/airbytehq/pyairbyte
PyAirbyte brings the power of Airbyte to every Python developer.
Last synced: 14 Dec 2025
https://github.com/cuebook/cuelake
Use SQL to build ELT pipelines on a data lakehouse.
apache-iceberg apache-spark data-engineering data-ingestion data-integration data-lake data-pipeline data-transfer datalake delta elt etl incremental-updates lakehouse pipelines spark-sql sql upsert zeppelin-notebook
Last synced: 07 Apr 2025
https://github.com/umitkaanusta/reddit-detective
Play detective on Reddit: Discover political disinformation campaigns, secret influencers and more
analysis analytics api data database elt etl graph graph-database neo4j network politics reddit social social-media social-network
Last synced: 06 Apr 2025
https://github.com/unytics/airbyte_serverless
Airbyte made simple (no UI, no database, no cluster)
airbyte bigquery data data-analysis data-engineering data-warehouse elt etl pipeline
Last synced: 16 May 2025
https://github.com/faros-ai/airbyte-connectors
Airbyte connectors (sources & destinations) + Airbyte CDK for JavaScript/TypeScript
airbyte airbyte-cdk airbyte-connectors airbyte-destinations airbyte-sources cdk cicd connectors elt etl faros javascript npm sdlc typescript
Last synced: 11 May 2025
https://github.com/transferia/transferia
Open Source Cloud Native Ingestion engine
bigdata cdc clickhouse elt go golang ingestion-platform kafka streaming
Last synced: 03 Apr 2025
https://github.com/yokawasa/databricks-notebooks
Collection of Sample Databricks Spark Notebooks ( mostly for Azure Databricks )
azure azuredatabricks databricks elt python spark streaming
Last synced: 19 Jun 2025
https://github.com/ascrus/getl
A tool for developing and testing ETL and ELT processes for automating the capture, delivery and processing of information in data warehouses on the MicroFocus Vertica platform.
csv dsl elt etl excel hdfs hive impala json kafka sql unit-testing vertica xml
Last synced: 14 Jun 2025
https://github.com/zsvoboda/dbd
dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
bigquery csv database database-schemas elt etl excel json mysql parquet postgresql python python3 redshift snowflake sql sqlite xls xlsx
Last synced: 11 Sep 2025
https://github.com/airbytehq/airflow-summit-airbyte-2022
git push your data stack with Airbyte, Airflow, and dbt - 2022 Airflow Summit
airbyte airflow data-engineering elt
Last synced: 13 Aug 2025
https://github.com/datasphere-oss/datasphere-integration
an data-centric integration platform
data-bus data-integration data-interchange data-sharing elt esb etl kettle realtime-messaging
Last synced: 14 Jul 2025
https://github.com/mundipagg/amora-data-build-tool
Amora Data Build Tool enables analysts and engineers to transform data on the data warehouse (BigQuery) by writing Amora Models that describe the data schema using Python's "PEP484 - Type Hints" and select statements with SQLAlchemy. Amora is able to transform Python code into SQL data transformation jobs that run inside the warehouse.
analytics analytics-dashboard analytics-engineering bigquery business-intelligence data-engineering data-modeling datacleaning dataquality elt machine-learning python transformation
Last synced: 08 Sep 2025
https://github.com/andrewtavis/wikirepo
Python based Wikidata framework for easy dataframe extraction
analytics data-analysis data-science data-structures database demography economics elt etl geography open-source political-science python python3 repository social-sciences sociology statistics wikidata wikipedia
Last synced: 14 Apr 2025
https://github.com/mattiasthalen/arcane-insight
Arcane Insight is a data analytics project designed to harness the power of SQLMesh & DuckDB to collect, transform, and analyze data from Blizzard’s Hearthstone API. Focused on card statistics and attributes, this project reveals detailed insights into card mechanics, strengths, and trends to support BI and strategic analysis.
analytics-engineering data-engineering data-vault data-warehouse duckdb elt etl hearthstone medallion-architecture sqlmesh
Last synced: 16 Apr 2025
https://github.com/airbytehq/abctl
Airbyte's CLI for managing local Airbyte installations
Last synced: 23 Apr 2025
https://github.com/firebolt-db/dbt-firebolt
The dbt adapter for Firebolt
data-modeling data-warehouse dbt elt firebolt transformation
Last synced: 29 Apr 2025
https://github.com/montara-io/dbt-command-center
Never sift through endless dbt™ logs again. dbt Command Center is a free, open-source, local web application that provides a user-friendly interface to monitor and manage dbt runs.
analytics-engineering bigquery data-analysis data-catalog data-engineering data-lineage data-observability data-pipeline data-pipelines data-validation data-warehouse dataops dbt dbt-packages elt etl orchestration python redshift
Last synced: 05 May 2025
https://github.com/cloudquery/plugin-sdk
CloudQuery Go SDK for source and destination plugins
cloudquery data-integration elt
Last synced: 05 Apr 2025
https://github.com/riveryio/rivery_cli
Rivery CLI
data-pipeline data-pipelines data-science database database-management dataops dataops-platform dwh dwh-team elt etl rivery
Last synced: 11 Jul 2025
https://github.com/koltyakov/cq-source-sharepoint
🔌 CloudQuery SharePoint Source Plugin
cloudquery elt etl integration plugin sharepoint sync
Last synced: 11 Apr 2025
https://github.com/meltanolabs/singer-working-group
Working group for ongoing development and iteration of the Singer Spec, the de-facto protocol for open source data connectors. Please use "Issues" to create discussion items - or use "Discussions" for general questions.
data-integration dataops elt etl etl-pipeline singer
Last synced: 19 Feb 2025
https://github.com/childmindresearch/bids2table
Efficiently index large-scale BIDS neuroimaging datasets and derivatives
arrow bids data-pipeline elt etl neuroimaging parquet
Last synced: 30 Mar 2025
https://github.com/typedef-ai/fenic
Build reliable AI and agentic applications with DataFrames
agents ai arrow dataframe-library dataframes duckdb elt etl llm orchestration polars pyspark python rust
Last synced: 23 Jun 2025
https://github.com/meltanolabs/tap-dbt
Singer Tap for dbt API v2 built with the Meltano SDK
dbt dbt-cloud elt extract-data meltano-sdk singer-io singer-tap
Last synced: 19 Oct 2025
https://github.com/meltanolabs/target-snowflake
Singer Target for the Snowflake cloud Data Warehouse
elt meltano singer-sdk singer-target snowflake
Last synced: 07 Jan 2026
https://github.com/dataform-co/dataform-example-project
Example project on Dataform
data-analysis data-pipeline data-transformation elt sql sqlx
Last synced: 03 May 2025
https://github.com/scribe-org/scribe-server
Backend service for Scribe data downloads
api autosuggest backend data data-downloader data-pipeline dictionary education elt emoji go golang grammar language learning open-source translation wikidata wikipedia
Last synced: 30 Oct 2025
https://github.com/mahmoudparsian/data-warehousing
This repository is a place for the Data Warehousing course at the Information Systems & Analytics department, Santa Clara University.
business-intelligence data-analytics data-lake data-lakehouse data-mining data-modeling data-visualization data-warehouse data-warehousing database dimensional-modeling elt etl extract load snowflake-schema star-schema tableau transform
Last synced: 03 Jul 2025
https://github.com/edgarrmondragon/meltano-dogfood
Personal dogfood Meltano project
bigquery dbt dogfood elt evidence-dev meltano
Last synced: 14 Apr 2025
https://github.com/kushalkhadka7/dagster_clickhouse_dbt
DBT and clickhouse test project with dagster
clickhouse dagster datapipeline dbt elt
Last synced: 13 Apr 2025
https://github.com/renatoelho/apache-nifi-enriquecimento-cep
Neste projeto, mergulho no universo do Apache Nifi, explorando como consumir e salvar dados de uma API diretamente em um banco de dados.
apache-nifi api elt etl mysql sql
Last synced: 04 Oct 2025
https://github.com/cre-dev/xml2db
A Python package to load complex XML files into a relational database
data-engineering data-loader database duckdb elt etl lxml mssql mysql postgresql python relational-databases sqlalchemy xml xmlschema xsd
Last synced: 15 Jul 2025
https://github.com/samber/ansible-role-airbyte
Ansible role for Airbyte
3rd-party airbyte ansible connector data data-analysis data-science data-visualization datawarehouse elt etl incremental integration pipeline replication reverse-etl role saas sync
Last synced: 12 Apr 2025
https://github.com/firelink-sh/evolve-py
A highly efficient, composable, and lightweight ETL and data integration framework
analytics arrow big-data data data-engineering data-integration data-science duckdb elt etl ingestion ingress ml olap pipeline polars postgresql python s3
Last synced: 16 Sep 2025
https://github.com/danhphan/trusted-data-pipeline
Building 3D Trusted Data Pipelines With Dagster, Dbt, and Duckdb
aws dagster data dbt duckdb elt engineering etl-pipeline s3
Last synced: 23 Jun 2025
https://github.com/reservoir-data/tap-confluence
Singer tap for the Confluence Content REST API
atlassian-confluence confluence-api elt meltano-sdk singer-io singer-tap
Last synced: 30 Dec 2025
https://github.com/umitkaanusta/smol-elt
a smol elt (not etl) pipeline for smol tasks
analytics automation aws aws-sns data data-engineering data-pipeline elt etl google-sheets pandas pipeline python spreadsheet web-scraping
Last synced: 04 Apr 2025
https://github.com/meltanolabs/tap-stackexchange
Singer tap for the StackExchange API
elt extract-data meltano-sdk singer-io singer-tap stackexchange
Last synced: 04 Aug 2025
https://github.com/teradata/dbt-teradata-utils
Teradata package that provides compatibility for dbt-utils
dbt elt sql teradata warehouses
Last synced: 11 Oct 2025
https://github.com/longnguyen010203/ecommerce-elt-pipeline
🌄📈📉 A Data Engineering Project 🌈 that implements an ELT data pipeline using Dagster, Docker, Dbt, Polars, Snowflake, PostgreSQL. Data from kaggle website 🔥
dagster data data-engineering dbt docker docker-compose dockerfile elt elt-pipeline extract kaggle load polars postgresql raw-data relational-databases snowflake transform
Last synced: 04 Oct 2025
https://github.com/flow-php/etl-adapter-doctrine
PHP ETL Adapter: Doctrine
data-engineering data-processing dbal doctrine elt flow-php
Last synced: 02 Jul 2025
https://github.com/gansanay/dbt-teradata
dbt adapter for Teradata data warehouses
Last synced: 20 Mar 2025
https://github.com/rifa8/capstone-project-with-dynamic-dag
The project focuses on creating an ELT pipeline to consolidate data from diverse resources into a single source of truth in BigQuery. The heart of this project is the innovative use of Apache Airflow to design a dynamic Directed Acyclic Graph (DAG) that automates task generation based on predefined file configurations.
Last synced: 22 Mar 2025
https://github.com/taquynhnga2001/proptech-dagster
Build an ELT pipeline with dagster and dbt to schedule loading HDB resale transactions in Singapore into Google BigQuery data warehouse, then create Power BI dashboard to enhance insight exploration.
bigquery dagster data-integration data-orchestration data-warehouse dbt elt etl powerbi python
Last synced: 14 Apr 2025
https://github.com/dmarks84/coursework_project_banks-web-scraping-sql
Project for IBM Data Engineering & Python course on ETL & Big Data -- Scraped website data and made API calls for additional data; wrangled and transformed this data and loaded into a SQL database.
apis beautifulsoup databases elt etl nosql numpy pandas pipelines python sql sqlite web-scraping
Last synced: 30 Dec 2025
https://github.com/archived-blueprints/postgresql-blueprints
Simplified blueprints for building data pipelines with PostgreSQL.
cli data-analysis data-engineering data-pipeline data-science database elt etl postgres postgresql
Last synced: 29 Jul 2025
https://github.com/salma-mamdoh/datawarehouse_project
Our project for Datawarehouse Course taken during fall 2024 semester
analytics dashboard datawarehousing deploy elt kpis powerbi sql ssis
Last synced: 20 Feb 2025
https://github.com/andrewcstewart/files-gitpod
Meltano project file bundle for https://www.gitpod.io/
Last synced: 20 Jul 2025
https://github.com/meltanolabs/tap-intacct
Singer tap for the Sage Intacct API
Last synced: 11 Apr 2025
https://github.com/dataopstix/modelt
Modelt(mow·delt) is a modern data integration solution that connects data to data for advanced analytics.
airbyte airflow airflow-docker data data-analysis data-visualization database dbt elt etl etl-automation metabase metadata modern modern-dev modernization
Last synced: 28 Mar 2025
https://github.com/cloudquery/recipes
Real-world CloudQuery configuration examples
Last synced: 05 Mar 2025
https://github.com/bchaoss/trash-wheel-pipeline
dbt data pipeline for analyzing trash wheel collection data
analytics-engineering dbt duckdb elt motherduck sql tidytuesday
Last synced: 07 Oct 2025
https://github.com/davidkhala/etl
Collection of data Extract, Transform, Load
apache-beam dbt elt etl fivetran
Last synced: 13 Oct 2025
https://github.com/reservoir-data/tap-honeybadger
Singer tap for Honeybadger.io
Last synced: 02 Nov 2025
https://github.com/reservoir-data/tap-google-play
Singer tap for Google Play Reviews
elt google-play meltano singer-io singer-tap
Last synced: 30 Dec 2025
https://github.com/vbalalian/roman_coins_data_pipeline
Work-in-progress, learning-focused, end-to-end ELT data pipeline project.
airbyte api beautifulsoup cicd coins dagster data-engineering data-pipeline docker docker-compose duckdb elt fastapi minio postgresql python rest-api sql web-scraping webscraping
Last synced: 04 Mar 2025
https://github.com/dmarks84/coursework_coursework_project_automobile-sales-visualization
Project for IBM Data Science course on Visualization & Dashboards -- Analyzed historical sales data, performing EDA and setting up an interactive dashboard
communication dash dashboards data-modeling elt etl folium matplotlib numpy pandas pipelines plotly python scipy seaborn visualization
Last synced: 30 Dec 2025
https://github.com/archived-blueprints/amazonathena-blueprints
Simplified blueprints for building data pipelines with Amazon Athena.
amazon-athena athena cli data-analysis data-engineering data-science elt etl
Last synced: 29 Jul 2025
https://github.com/ankushkgupta2/databricks-poc
:computer: :bar_chart: Proof of Concept (POC) Using Azure Databricks for Automated & Real-Time ETL, Generation of Visualizations, and Pipeline Integration for Various Pathogens
api azure backend blob clarity-lims database databricks dbfs elims elt etl graph-database json livetables metadata mpxv nextflow parquet poc yaml
Last synced: 21 Feb 2025
https://github.com/reservoir-data/tap-planetscaleapi
Singer Tap for the PlanetScale API
data-extraction elt meltano-sdk planetscale singer-sdk
Last synced: 27 Oct 2025
https://github.com/renatoelho/fluxo-elt
Trata-se de um processo de ELT (Extração, Carga e Transformação) que integra um sistema legado com um banco de dados relacional (no exemplo, um MySQL) para um banco NoSQL (ElasticSearch) sem alterações significativas nos dados transferidos.
docker docker-compose dockerfile elasticsearch elt etl kibana nifi sql
Last synced: 31 Dec 2025
https://github.com/fisseha-estifanos/elt
A show case repository to use airflow, dbt and data warehouse systems in order to perform an ELT task.
airflow dag data-engineering dbt elt
Last synced: 22 Feb 2025
https://github.com/dmarks84/coursework_project_apache-airflow-kafka-on-toll-booth-data
Project for IBM Data Engineering & Python course on ETL & Big Data -- Read in live toll booth data, wrangles and transformed, and wrote into a SQL database
apache-airflow apache-kafka automation dags data-modeling databases eda elt etl mysql numpy pandas pipelines python sql
Last synced: 09 Apr 2025
https://github.com/raflyritonga/imdb-movie-elt
The containerized orchestrated ELT pipeline for IMDB movie
airflow data-engineering dbt docker elt elt-pipeline
Last synced: 17 Jun 2025
https://github.com/manoharvit/ecommerce-dive-deep-sales-analysis
In this project, we developed an ETL pipeline using Apache Airflow to process delivery data and track delayed shipments. The pipeline downloads data from an AWS S3 bucket, cleans it using Spark/Spark SQL to identify missing delivery deadlines, and uploads the cleaned dataset back to S3. This ensures efficient delivery performance tracking.
airflow airflow-dags ecommerce elt pyspark s3 s3-bucket spark sql
Last synced: 31 Jul 2025
https://github.com/mehassanhmood/bigdata-analytics
Retrieving data from different resources and bringing the preprocessed data to PowerBI for Visualizations
azuresql dataware elt etl-pipeline powerbi
Last synced: 29 Oct 2025
https://github.com/apache/airflow-publish
Publishing PyPI packages for Apache Airflow
airflow apache apache-airflow automation dag data-engineering data-integration data-orchestrator data-pipelines data-science elt etl machine-learning mlops orchestration python scheduler workflow workflow-engine workflow-orchestration
Last synced: 07 May 2025
https://github.com/romanow/data-migration-lib
Database migration library based on Spring Batch
database-migration elt spring-batch
Last synced: 18 Jul 2025