An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with elt

A curated list of projects in awesome lists tagged with elt .

https://github.com/airbytehq/airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

bigquery change-data-capture data data-analysis data-collection data-engineering data-integration data-pipeline elt etl java mssql mysql pipeline postgresql python redshift s3 self-hosted snowflake

Last synced: 09 Sep 2025

https://github.com/apache/doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

bigquery database dbt delta-lake elt etl hadoop hive hudi iceberg lakehouse olap query-engine real-time redshift snowflake spark sql

Last synced: 12 May 2025

https://github.com/dbt-labs/dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

analytics business-intelligence data-modeling dbt-viewpoint elt pypa slack

Last synced: 07 Jan 2026

https://github.com/apache/seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.

apache batch cdc change-data-capture data-ingestion data-integration elt high-performance offline real-time streaming

Last synced: 12 May 2025

https://github.com/dlt-hub/dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

data data-engineering data-lake data-loading data-warehouse elt extract load python transform

Last synced: 26 Mar 2025

https://github.com/tobikodata/sqlmesh

Scalable and efficient data transformation framework - backwards compatible with dbt.

dataengineering dataops dbt elt etl python sql transformation

Last synced: 21 Oct 2025

https://github.com/TobikoData/sqlmesh

Efficient data transformation and modeling framework that is backwards compatible with dbt.

dataengineering dataops dbt elt etl python sql transformation

Last synced: 26 Mar 2025

https://github.com/quarylabs/quary

Open-source BI for engineers

analytics big-data business-intelligence data-modeling elt

Last synced: 26 Mar 2025

https://github.com/meltano/meltano

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

connectors data data-engineering data-pipelines dataops dataops-platform elt extract-data integration loaders meltano meltano-sdk open-source opensource pipelines singer tap taps target targets

Last synced: 12 May 2025

https://github.com/ucbepic/docetl

A system for agentic LLM-powered data processing and ETL

agents data data-pipelines elt etl llm python workflow

Last synced: 12 Oct 2025

https://github.com/dataform-co/dataform

Dataform is a framework for managing SQL based data operations in BigQuery

analytics business-intelligence data-engineering data-pipelines elt etl hacktoberfest

Last synced: 13 May 2025

https://github.com/kuwala-io/kuwala

Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times

admin-boundaries data data-integration data-science dbt elt google-trends jupyter kuwala no-code open-data open-source population postgres pyspark python react react-flow scraping spatial-analysis

Last synced: 30 Mar 2025

https://github.com/raystack/optimus

Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.

airflow analytics analytics-engineering automation bigquery business-intelligence data-modelling data-pipelines data-transformation data-warehouse dataops elt etl golang workflows

Last synced: 16 May 2025

https://github.com/artie-labs/transfer

Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift, Databricks) in real-time.

apache-kafka bigquery cdc change-data-capture data-integration data-pipelines database debezium elt golang kafka redshift snowflake

Last synced: 28 Dec 2025

https://github.com/Datavault-UK/automate-dv

A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)

data-vault dataengineering datalake datavault datavault20 datawarehouse datawarehousing dbt elt etl metadata snowflake sql

Last synced: 13 May 2025

https://github.com/xorq-labs/xorq

multi-engine batch transformation framework

arrow dataframe elt machine-learning multi-engine python sklearn sql

Last synced: 05 Oct 2025

https://github.com/astronomer/astro-sdk

Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.

airflow apache-airflow bigquery dags data-analysis data-science elt etl gcs pandas postgres python s3 snowflake sql sqlite workflows

Last synced: 13 Apr 2025

https://github.com/airbytehq/pyairbyte

PyAirbyte brings the power of Airbyte to every Python developer.

data-engineering elt python

Last synced: 14 Dec 2025

https://github.com/datacoves/dbt-coves

CLI tool for dbt users to simplify creation of staging models (yml and sql) files

analytics bigquery datacoves dbt elt etl jinja python redshift snowflake sql

Last synced: 15 May 2025

https://github.com/umitkaanusta/reddit-detective

Play detective on Reddit: Discover political disinformation campaigns, secret influencers and more

analysis analytics api data database elt etl graph graph-database neo4j network politics reddit social social-media social-network

Last synced: 06 Apr 2025

https://github.com/unytics/airbyte_serverless

Airbyte made simple (no UI, no database, no cluster)

airbyte bigquery data data-analysis data-engineering data-warehouse elt etl pipeline

Last synced: 16 May 2025

https://github.com/faros-ai/airbyte-connectors

Airbyte connectors (sources & destinations) + Airbyte CDK for JavaScript/TypeScript

airbyte airbyte-cdk airbyte-connectors airbyte-destinations airbyte-sources cdk cicd connectors elt etl faros javascript npm sdlc typescript

Last synced: 11 May 2025

https://github.com/transferia/transferia

Open Source Cloud Native Ingestion engine

bigdata cdc clickhouse elt go golang ingestion-platform kafka streaming

Last synced: 03 Apr 2025

https://github.com/yokawasa/databricks-notebooks

Collection of Sample Databricks Spark Notebooks ( mostly for Azure Databricks )

azure azuredatabricks databricks elt python spark streaming

Last synced: 19 Jun 2025

https://github.com/ascrus/getl

A tool for developing and testing ETL and ELT processes for automating the capture, delivery and processing of information in data warehouses on the MicroFocus Vertica platform.

csv dsl elt etl excel hdfs hive impala json kafka sql unit-testing vertica xml

Last synced: 14 Jun 2025

https://github.com/zsvoboda/dbd

dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.

bigquery csv database database-schemas elt etl excel json mysql parquet postgresql python python3 redshift snowflake sql sqlite xls xlsx

Last synced: 11 Sep 2025

https://github.com/airbytehq/airflow-summit-airbyte-2022

git push your data stack with Airbyte, Airflow, and dbt - 2022 Airflow Summit

airbyte airflow data-engineering elt

Last synced: 13 Aug 2025

https://github.com/mundipagg/amora-data-build-tool

Amora Data Build Tool enables analysts and engineers to transform data on the data warehouse (BigQuery) by writing Amora Models that describe the data schema using Python's "PEP484 - Type Hints" and select statements with SQLAlchemy. Amora is able to transform Python code into SQL data transformation jobs that run inside the warehouse.

analytics analytics-dashboard analytics-engineering bigquery business-intelligence data-engineering data-modeling datacleaning dataquality elt machine-learning python transformation

Last synced: 08 Sep 2025

https://github.com/mattiasthalen/arcane-insight

Arcane Insight is a data analytics project designed to harness the power of SQLMesh & DuckDB to collect, transform, and analyze data from Blizzard’s Hearthstone API. Focused on card statistics and attributes, this project reveals detailed insights into card mechanics, strengths, and trends to support BI and strategic analysis.

analytics-engineering data-engineering data-vault data-warehouse duckdb elt etl hearthstone medallion-architecture sqlmesh

Last synced: 16 Apr 2025

https://github.com/airbytehq/abctl

Airbyte's CLI for managing local Airbyte installations

elt go golang

Last synced: 23 Apr 2025

https://github.com/montara-io/dbt-command-center

Never sift through endless dbt™ logs again. dbt Command Center is a free, open-source, local web application that provides a user-friendly interface to monitor and manage dbt runs.

analytics-engineering bigquery data-analysis data-catalog data-engineering data-lineage data-observability data-pipeline data-pipelines data-validation data-warehouse dataops dbt dbt-packages elt etl orchestration python redshift

Last synced: 05 May 2025

https://github.com/cloudquery/plugin-sdk

CloudQuery Go SDK for source and destination plugins

cloudquery data-integration elt

Last synced: 05 Apr 2025

https://github.com/feluelle/finance-data-builder

Finance 🏦 Data Builder 🛠️ @ postgres 🐘

airflow dbt docker elt etl finance workflow

Last synced: 16 Jun 2025

https://github.com/teradata/dbt-teradata

dbt adapter for Teradata

dbt elt teradata warehouses

Last synced: 06 Mar 2025

https://github.com/koltyakov/cq-source-sharepoint

🔌 CloudQuery SharePoint Source Plugin

cloudquery elt etl integration plugin sharepoint sync

Last synced: 11 Apr 2025

https://github.com/meltanolabs/singer-working-group

Working group for ongoing development and iteration of the Singer Spec, the de-facto protocol for open source data connectors. Please use "Issues" to create discussion items - or use "Discussions" for general questions.

data-integration dataops elt etl etl-pipeline singer

Last synced: 19 Feb 2025

https://github.com/childmindresearch/bids2table

Efficiently index large-scale BIDS neuroimaging datasets and derivatives

arrow bids data-pipeline elt etl neuroimaging parquet

Last synced: 30 Mar 2025

https://github.com/typedef-ai/fenic

Build reliable AI and agentic applications with DataFrames

agents ai arrow dataframe-library dataframes duckdb elt etl llm orchestration polars pyspark python rust

Last synced: 23 Jun 2025

https://github.com/meltanolabs/tap-dbt

Singer Tap for dbt API v2 built with the Meltano SDK

dbt dbt-cloud elt extract-data meltano-sdk singer-io singer-tap

Last synced: 19 Oct 2025

https://github.com/meltanolabs/target-snowflake

Singer Target for the Snowflake cloud Data Warehouse

elt meltano singer-sdk singer-target snowflake

Last synced: 07 Jan 2026

https://github.com/oresttokovenko/retailflow

End-to-End ELT data pipeline with Postgres, Airbyte, dbt, Dagster, Snowflake and Metabase

airbyte aws aws-lambda dagster dbt docker ecs elt lambda metabase postgres python snowflake sql terraform

Last synced: 13 Apr 2025

https://github.com/edgarrmondragon/meltano-dogfood

Personal dogfood Meltano project

bigquery dbt dogfood elt evidence-dev meltano

Last synced: 14 Apr 2025

https://github.com/kushalkhadka7/dagster_clickhouse_dbt

DBT and clickhouse test project with dagster

clickhouse dagster datapipeline dbt elt

Last synced: 13 Apr 2025

https://github.com/bayoadejare/airbyte_dbt_covid19

dbt transformations for Snowflake data warehouse.

airbyte covid-19 dbt elt snowflake sql

Last synced: 06 Oct 2025

https://github.com/renatoelho/apache-nifi-enriquecimento-cep

Neste projeto, mergulho no universo do Apache Nifi, explorando como consumir e salvar dados de uma API diretamente em um banco de dados.

apache-nifi api elt etl mysql sql

Last synced: 04 Oct 2025

https://github.com/cre-dev/xml2db

A Python package to load complex XML files into a relational database

data-engineering data-loader database duckdb elt etl lxml mssql mysql postgresql python relational-databases sqlalchemy xml xmlschema xsd

Last synced: 15 Jul 2025

https://github.com/firelink-sh/evolve-py

A highly efficient, composable, and lightweight ETL and data integration framework

analytics arrow big-data data data-engineering data-integration data-science duckdb elt etl ingestion ingress ml olap pipeline polars postgresql python s3

Last synced: 16 Sep 2025

https://github.com/danhphan/trusted-data-pipeline

Building 3D Trusted Data Pipelines With Dagster, Dbt, and Duckdb

aws dagster data dbt duckdb elt engineering etl-pipeline s3

Last synced: 23 Jun 2025

https://github.com/reservoir-data/tap-confluence

Singer tap for the Confluence Content REST API

atlassian-confluence confluence-api elt meltano-sdk singer-io singer-tap

Last synced: 30 Dec 2025

https://github.com/teradata/dbt-teradata-utils

Teradata package that provides compatibility for dbt-utils

dbt elt sql teradata warehouses

Last synced: 11 Oct 2025

https://github.com/longnguyen010203/ecommerce-elt-pipeline

🌄📈📉 A Data Engineering Project 🌈 that implements an ELT data pipeline using Dagster, Docker, Dbt, Polars, Snowflake, PostgreSQL. Data from kaggle website 🔥

dagster data data-engineering dbt docker docker-compose dockerfile elt elt-pipeline extract kaggle load polars postgresql raw-data relational-databases snowflake transform

Last synced: 04 Oct 2025

https://github.com/gansanay/dbt-teradata

dbt adapter for Teradata data warehouses

analytics data-modeling elt

Last synced: 20 Mar 2025

https://github.com/rifa8/capstone-project-with-dynamic-dag

The project focuses on creating an ELT pipeline to consolidate data from diverse resources into a single source of truth in BigQuery. The heart of this project is the innovative use of Apache Airflow to design a dynamic Directed Acyclic Graph (DAG) that automates task generation based on predefined file configurations.

dynamic-dag elt visualization

Last synced: 22 Mar 2025

https://github.com/taquynhnga2001/proptech-dagster

Build an ELT pipeline with dagster and dbt to schedule loading HDB resale transactions in Singapore into Google BigQuery data warehouse, then create Power BI dashboard to enhance insight exploration.

bigquery dagster data-integration data-orchestration data-warehouse dbt elt etl powerbi python

Last synced: 14 Apr 2025

https://github.com/dmarks84/coursework_project_banks-web-scraping-sql

Project for IBM Data Engineering & Python course on ETL & Big Data -- Scraped website data and made API calls for additional data; wrangled and transformed this data and loaded into a SQL database.

apis beautifulsoup databases elt etl nosql numpy pandas pipelines python sql sqlite web-scraping

Last synced: 30 Dec 2025

https://github.com/archived-blueprints/postgresql-blueprints

Simplified blueprints for building data pipelines with PostgreSQL.

cli data-analysis data-engineering data-pipeline data-science database elt etl postgres postgresql

Last synced: 29 Jul 2025

https://github.com/salma-mamdoh/datawarehouse_project

Our project for Datawarehouse Course taken during fall 2024 semester

analytics dashboard datawarehousing deploy elt kpis powerbi sql ssis

Last synced: 20 Feb 2025

https://github.com/andrewcstewart/files-gitpod

Meltano project file bundle for https://www.gitpod.io/

elt etl gitpod meltano

Last synced: 20 Jul 2025

https://github.com/meltanolabs/tap-intacct

Singer tap for the Sage Intacct API

elt meltano singer-tap

Last synced: 11 Apr 2025

https://github.com/dataopstix/modelt

Modelt(mow·delt) is a modern data integration solution that connects data to data for advanced analytics.

airbyte airflow airflow-docker data data-analysis data-visualization database dbt elt etl etl-automation metabase metadata modern modern-dev modernization

Last synced: 28 Mar 2025

https://github.com/cloudquery/recipes

Real-world CloudQuery configuration examples

cloudquery elt

Last synced: 05 Mar 2025

https://github.com/bchaoss/trash-wheel-pipeline

dbt data pipeline for analyzing trash wheel collection data

analytics-engineering dbt duckdb elt motherduck sql tidytuesday

Last synced: 07 Oct 2025

https://github.com/davidkhala/etl

Collection of data Extract, Transform, Load

apache-beam dbt elt etl fivetran

Last synced: 13 Oct 2025

https://github.com/reservoir-data/tap-honeybadger

Singer tap for Honeybadger.io

elt meltano singer-tap

Last synced: 02 Nov 2025

https://github.com/reservoir-data/tap-google-play

Singer tap for Google Play Reviews

elt google-play meltano singer-io singer-tap

Last synced: 30 Dec 2025

https://github.com/dmarks84/coursework_coursework_project_automobile-sales-visualization

Project for IBM Data Science course on Visualization & Dashboards -- Analyzed historical sales data, performing EDA and setting up an interactive dashboard

communication dash dashboards data-modeling elt etl folium matplotlib numpy pandas pipelines plotly python scipy seaborn visualization

Last synced: 30 Dec 2025

https://github.com/archived-blueprints/amazonathena-blueprints

Simplified blueprints for building data pipelines with Amazon Athena.

amazon-athena athena cli data-analysis data-engineering data-science elt etl

Last synced: 29 Jul 2025

https://github.com/ankushkgupta2/databricks-poc

:computer: :bar_chart: Proof of Concept (POC) Using Azure Databricks for Automated & Real-Time ETL, Generation of Visualizations, and Pipeline Integration for Various Pathogens

api azure backend blob clarity-lims database databricks dbfs elims elt etl graph-database json livetables metadata mpxv nextflow parquet poc yaml

Last synced: 21 Feb 2025

https://github.com/renatoelho/fluxo-elt

Trata-se de um processo de ELT (Extração, Carga e Transformação) que integra um sistema legado com um banco de dados relacional (no exemplo, um MySQL) para um banco NoSQL (ElasticSearch) sem alterações significativas nos dados transferidos.

docker docker-compose dockerfile elasticsearch elt etl kibana nifi sql

Last synced: 31 Dec 2025

https://github.com/fisseha-estifanos/elt

A show case repository to use airflow, dbt and data warehouse systems in order to perform an ELT task.

airflow dag data-engineering dbt elt

Last synced: 22 Feb 2025

https://github.com/dmarks84/coursework_project_apache-airflow-kafka-on-toll-booth-data

Project for IBM Data Engineering & Python course on ETL & Big Data -- Read in live toll booth data, wrangles and transformed, and wrote into a SQL database

apache-airflow apache-kafka automation dags data-modeling databases eda elt etl mysql numpy pandas pipelines python sql

Last synced: 09 Apr 2025

https://github.com/raflyritonga/imdb-movie-elt

The containerized orchestrated ELT pipeline for IMDB movie

airflow data-engineering dbt docker elt elt-pipeline

Last synced: 17 Jun 2025

https://github.com/manoharvit/ecommerce-dive-deep-sales-analysis

In this project, we developed an ETL pipeline using Apache Airflow to process delivery data and track delayed shipments. The pipeline downloads data from an AWS S3 bucket, cleans it using Spark/Spark SQL to identify missing delivery deadlines, and uploads the cleaned dataset back to S3. This ensures efficient delivery performance tracking.

airflow airflow-dags ecommerce elt pyspark s3 s3-bucket spark sql

Last synced: 31 Jul 2025

https://github.com/mehassanhmood/bigdata-analytics

Retrieving data from different resources and bringing the preprocessed data to PowerBI for Visualizations

azuresql dataware elt etl-pipeline powerbi

Last synced: 29 Oct 2025

https://github.com/romanow/data-migration-lib

Database migration library based on Spring Batch

database-migration elt spring-batch

Last synced: 18 Jul 2025