An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with elt

A curated list of projects in awesome lists tagged with elt .

https://github.com/airbytehq/airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

bigquery change-data-capture data data-analysis data-collection data-engineering data-integration data-pipeline elt etl java mssql mysql pipeline postgresql python redshift s3 self-hosted snowflake

Last synced: 09 Sep 2025

https://github.com/apache/doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

bigquery database dbt delta-lake elt etl hadoop hive hudi iceberg lakehouse olap query-engine real-time redshift snowflake spark sql

Last synced: 30 Jan 2026

https://github.com/dbt-labs/dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

analytics business-intelligence data-modeling dbt-viewpoint elt pypa slack

Last synced: 09 Apr 2026

https://github.com/apache/seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.

apache batch cdc change-data-capture data-ingestion data-integration elt high-performance offline real-time streaming

Last synced: 12 May 2025

https://github.com/cloudquery/cloudquery

Data pipelines for cloud config and security data. Build cloud asset inventory, CSPM, FinOps, and vulnerability management solutions. Extract from AWS, Azure, GCP, and 70+ cloud and SaaS sources.

airbyte attack-surface-management aws azure bigquery cspm data data-analysis data-collection data-engineering data-integration elt etl etl-framework gcp github-api go google kubernetes sql

Last synced: 16 May 2026

https://github.com/dlt-hub/dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

data data-engineering data-lake data-loading data-warehouse elt extract load python transform

Last synced: 01 Apr 2026

https://github.com/tobikodata/sqlmesh

Scalable and efficient data transformation framework - backwards compatible with dbt.

dataengineering dataops dbt elt etl python sql transformation

Last synced: 21 Jan 2026

https://github.com/TobikoData/sqlmesh

Efficient data transformation and modeling framework that is backwards compatible with dbt.

dataengineering dataops dbt elt etl python sql transformation

Last synced: 26 Mar 2025

https://github.com/quarylabs/quary

Open-source BI for engineers

analytics big-data business-intelligence data-modeling elt

Last synced: 24 Jan 2026

https://github.com/meltano/meltano

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

connectors data data-engineering data-pipelines dataops dataops-platform elt extract-data integration loaders meltano meltano-sdk open-source opensource pipelines singer tap taps target targets

Last synced: 03 Feb 2026

https://github.com/ucbepic/docetl

A system for agentic LLM-powered data processing and ETL

agents data data-pipelines elt etl llm python workflow

Last synced: 12 Oct 2025

https://github.com/dataform-co/dataform

Dataform is a framework for managing SQL based data operations in BigQuery

analytics business-intelligence data-engineering data-pipelines elt etl hacktoberfest

Last synced: 04 Feb 2026

https://github.com/artie-labs/transfer

Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift, Databricks) in real-time.

apache-kafka bigquery cdc change-data-capture data-integration data-pipelines database debezium elt golang kafka redshift snowflake

Last synced: 30 Apr 2026

https://github.com/slingdata-io/sling-cli

Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.

elt etl extract load

Last synced: 06 Mar 2026

https://github.com/kuwala-io/kuwala

Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times

admin-boundaries data data-integration data-science dbt elt google-trends jupyter kuwala no-code open-data open-source population postgres pyspark python react react-flow scraping spatial-analysis

Last synced: 30 Mar 2025

https://github.com/raystack/optimus

Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.

airflow analytics analytics-engineering automation bigquery business-intelligence data-modelling data-pipelines data-transformation data-warehouse dataops elt etl golang workflows

Last synced: 16 May 2025

https://github.com/Datavault-UK/automate-dv

A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)

data-vault dataengineering datalake datavault datavault20 datawarehouse datawarehousing dbt elt etl metadata snowflake sql

Last synced: 13 May 2025

https://github.com/datarecce/recce

The data-validation toolkit for enhanced dbt (data build tool) PR review

analytics-engineering data data-engineering data-validation dataops dbt elt

Last synced: 14 Apr 2026

https://github.com/xorq-labs/xorq

multi-engine batch transformation framework

arrow dataframe elt machine-learning multi-engine python sklearn sql

Last synced: 06 Mar 2026

https://github.com/astronomer/astro-sdk

Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.

airflow apache-airflow bigquery dags data-analysis data-science elt etl gcs pandas postgres python s3 snowflake sql sqlite workflows

Last synced: 13 Apr 2025

https://github.com/airbytehq/pyairbyte

PyAirbyte brings the power of Airbyte to every Python developer.

data-engineering elt python

Last synced: 03 Apr 2026

https://github.com/datacoves/dbt-coves

CLI tool for dbt users to simplify creation of staging models (yml and sql) files

analytics bigquery datacoves dbt elt etl jinja python redshift snowflake sql

Last synced: 15 May 2025

https://github.com/umitkaanusta/reddit-detective

Play detective on Reddit: Discover political disinformation campaigns, secret influencers and more

analysis analytics api data database elt etl graph graph-database neo4j network politics reddit social social-media social-network

Last synced: 06 Apr 2025

https://github.com/unytics/airbyte_serverless

Airbyte made simple (no UI, no database, no cluster)

airbyte bigquery data data-analysis data-engineering data-warehouse elt etl pipeline

Last synced: 16 May 2025

https://github.com/faros-ai/airbyte-connectors

Airbyte connectors (sources & destinations) + Airbyte CDK for JavaScript/TypeScript

airbyte airbyte-cdk airbyte-connectors airbyte-destinations airbyte-sources cdk cicd connectors elt etl faros javascript npm sdlc typescript

Last synced: 11 May 2025

https://github.com/transferia/transferia

Open Source Cloud Native Ingestion engine

bigdata cdc clickhouse elt go golang ingestion-platform kafka streaming

Last synced: 03 Apr 2025

https://github.com/yokawasa/databricks-notebooks

Collection of Sample Databricks Spark Notebooks ( mostly for Azure Databricks )

azure azuredatabricks databricks elt python spark streaming

Last synced: 19 Jun 2025

https://github.com/codeforkjeff/dbt-sqlite

A SQLite adapter plugin for dbt (data build tool)

dbt elt etl sqlite

Last synced: 31 Jan 2026

https://github.com/zsvoboda/dbd

dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.

bigquery csv database database-schemas elt etl excel json mysql parquet postgresql python python3 redshift snowflake sql sqlite xls xlsx

Last synced: 11 Sep 2025

https://github.com/ascrus/getl

A tool for developing and testing ETL and ELT processes for automating the capture, delivery and processing of information in data warehouses on the MicroFocus Vertica platform.

csv dsl elt etl excel hdfs hive impala json kafka sql unit-testing vertica xml

Last synced: 14 Jun 2025

https://github.com/airbytehq/airflow-summit-airbyte-2022

git push your data stack with Airbyte, Airflow, and dbt - 2022 Airflow Summit

airbyte airflow data-engineering elt

Last synced: 13 Aug 2025

https://github.com/mundipagg/amora-data-build-tool

Amora Data Build Tool enables analysts and engineers to transform data on the data warehouse (BigQuery) by writing Amora Models that describe the data schema using Python's "PEP484 - Type Hints" and select statements with SQLAlchemy. Amora is able to transform Python code into SQL data transformation jobs that run inside the warehouse.

analytics analytics-dashboard analytics-engineering bigquery business-intelligence data-engineering data-modeling datacleaning dataquality elt machine-learning python transformation

Last synced: 08 Sep 2025

https://github.com/realdatadriven/etlx

ETL / ELT / Reverse ETL Framework powered by DuckDB, designed to seamlessly integrate and process data from diverse sources. It leverages Markdown as a configuration medium, where YAML blocks define metadata for each data source, and embedded SQL blocks specify the extraction, transformation, and loading logic.

data-engineering data-lake data-lakehouse data-quality data-quality-checks data-quality-monitoring data-science duckdb elt elt-pipeline etl etl-elt-pipelines etl-pipeline object-storage relational-databases report report-automation s3 s3-storage

Last synced: 30 Apr 2026

https://github.com/mattiasthalen/arcane-insight

Arcane Insight is a data analytics project designed to harness the power of SQLMesh & DuckDB to collect, transform, and analyze data from Blizzard’s Hearthstone API. Focused on card statistics and attributes, this project reveals detailed insights into card mechanics, strengths, and trends to support BI and strategic analysis.

analytics-engineering data-engineering data-vault data-warehouse duckdb elt etl hearthstone medallion-architecture sqlmesh

Last synced: 11 Feb 2026

https://github.com/mattiasthalen/obsidian-insights

Personal project for setting up an open source data warehouse.

data-warehouse dlt duckdb elt motherduck sqlmesh unified-star-schema

Last synced: 11 Feb 2026

https://github.com/guidok91/spark-movies-etl

Spark data pipeline that processes movie ratings data.

apache-iceberg data-engineering data-pipeline elt etl pyspark spark uv

Last synced: 11 Mar 2026

https://github.com/airbytehq/abctl

Airbyte's CLI for managing local Airbyte installations

elt go golang

Last synced: 05 Mar 2026

https://github.com/datazip-inc/olake-ui

Frontend & BFF (Backend for frontend) for Olake. This includes the UI code and backend code for storing the configuration of sync and orchestrating it.

apache-iceberg change-data-capture data-engineering database elt elt-pipeline etl etl-pipeline hacktoberfest ui

Last synced: 23 Apr 2026

https://github.com/montara-io/dbt-command-center

Never sift through endless dbt™ logs again. dbt Command Center is a free, open-source, local web application that provides a user-friendly interface to monitor and manage dbt runs.

analytics-engineering bigquery data-analysis data-catalog data-engineering data-lineage data-observability data-pipeline data-pipelines data-validation data-warehouse dataops dbt dbt-packages elt etl orchestration python redshift

Last synced: 05 May 2025

https://github.com/cloudquery/plugin-sdk

CloudQuery Go SDK for source and destination plugins

cloudquery data-integration elt

Last synced: 05 Apr 2025

https://github.com/feluelle/finance-data-builder

Finance 🏦 Data Builder 🛠️ @ postgres 🐘

airflow dbt docker elt etl finance workflow

Last synced: 16 Jun 2025

https://github.com/teradata/dbt-teradata

dbt adapter for Teradata

dbt elt teradata warehouses

Last synced: 03 Mar 2026

https://github.com/koltyakov/cq-source-sharepoint

🔌 CloudQuery SharePoint Source Plugin

cloudquery elt etl integration plugin sharepoint sync

Last synced: 11 Apr 2025

https://github.com/meltanolabs/singer-working-group

Working group for ongoing development and iteration of the Singer Spec, the de-facto protocol for open source data connectors. Please use "Issues" to create discussion items - or use "Discussions" for general questions.

data-integration dataops elt etl etl-pipeline singer

Last synced: 25 Feb 2026

https://github.com/typedef-ai/fenic

Build reliable AI and agentic applications with DataFrames

agents ai arrow dataframe-library dataframes duckdb elt etl llm orchestration polars pyspark python rust

Last synced: 23 Jun 2025

https://github.com/meltanolabs/tap-dbt

Singer Tap for dbt API v2 built with the Meltano SDK

dbt dbt-cloud elt extract-data meltano-sdk singer-io singer-tap

Last synced: 19 Oct 2025

https://github.com/oresttokovenko/retailflow

End-to-End ELT data pipeline with Postgres, Airbyte, dbt, Dagster, Snowflake and Metabase

airbyte aws aws-lambda dagster dbt docker ecs elt lambda metabase postgres python snowflake sql terraform

Last synced: 13 Apr 2025

https://github.com/meltanolabs/target-snowflake

Singer Target for the Snowflake cloud Data Warehouse

elt meltano singer-sdk singer-target snowflake

Last synced: 10 Mar 2026

https://github.com/edgarrmondragon/meltano-dogfood

Personal dogfood Meltano project

bigquery dbt dogfood elt evidence-dev meltano

Last synced: 14 Apr 2025

https://github.com/kushalkhadka7/dagster_clickhouse_dbt

DBT and clickhouse test project with dagster

clickhouse dagster datapipeline dbt elt

Last synced: 13 Apr 2025

https://github.com/renatoelho/apache-nifi-enriquecimento-cep

Neste projeto, mergulho no universo do Apache Nifi, explorando como consumir e salvar dados de uma API diretamente em um banco de dados.

apache-nifi api elt etl mysql sql

Last synced: 04 Oct 2025

https://github.com/bayoadejare/airbyte_dbt_covid19

dbt transformations for Snowflake data warehouse.

airbyte covid-19 dbt elt snowflake sql

Last synced: 06 Oct 2025

https://github.com/cre-dev/xml2db

A Python package to load complex XML files into a relational database

data-engineering data-loader database duckdb elt etl lxml mssql mysql postgresql python relational-databases sqlalchemy xml xmlschema xsd

Last synced: 15 Jul 2025

https://github.com/longnguyen010203/ecommerce-elt-pipeline

🌄📈📉 A Data Engineering Project 🌈 that implements an ELT data pipeline using Dagster, Docker, Dbt, Polars, Snowflake, PostgreSQL. Data from kaggle website 🔥

dagster data data-engineering dbt docker docker-compose dockerfile elt elt-pipeline extract kaggle load polars postgresql raw-data relational-databases snowflake transform

Last synced: 27 Feb 2026

https://github.com/firelink-sh/evolve-py

A highly efficient, composable, and lightweight ETL and data integration framework.

analytics arrow big-data data data-engineering data-integration data-science duckdb elt etl ingestion ingress ml olap pipeline polars postgresql python s3

Last synced: 10 Mar 2026

https://github.com/danhphan/trusted-data-pipeline

Building 3D Trusted Data Pipelines With Dagster, Dbt, and Duckdb

aws dagster data dbt duckdb elt engineering etl-pipeline s3

Last synced: 23 Jun 2025

https://github.com/voxmedia/tap-instagram

Singer Tap for the Instagram Graph API

elt instagram meltano singer singer-tap

Last synced: 01 Feb 2026

https://github.com/reservoir-data/tap-confluence

Singer tap for the Confluence Content REST API

atlassian-confluence confluence-api elt meltano-sdk singer-io singer-tap

Last synced: 30 Dec 2025

https://github.com/gansanay/dbt-teradata

dbt adapter for Teradata data warehouses

analytics data-modeling elt

Last synced: 20 Mar 2025

https://github.com/teradata/dbt-teradata-utils

Teradata package that provides compatibility for dbt-utils

dbt elt sql teradata warehouses

Last synced: 11 Oct 2025

https://github.com/salma-mamdoh/datawarehouse_project

Our project for Datawarehouse Course taken during fall 2024 semester

analytics dashboard datawarehousing deploy elt kpis powerbi sql ssis

Last synced: 25 Jan 2026

https://github.com/archived-blueprints/postgresql-blueprints

Simplified blueprints for building data pipelines with PostgreSQL.

cli data-analysis data-engineering data-pipeline data-science database elt etl postgres postgresql

Last synced: 29 Jul 2025

https://github.com/andrewcstewart/files-gitpod

Meltano project file bundle for https://www.gitpod.io/

elt etl gitpod meltano

Last synced: 07 Feb 2026

https://github.com/davidkhala/etl

Collection of data Extract, Transform, Load

apache-beam dbt elt etl fivetran

Last synced: 17 Feb 2026

https://github.com/dataopstix/modelt

Modelt(mow·delt) is a modern data integration solution that connects data to data for advanced analytics.

airbyte airflow airflow-docker data data-analysis data-visualization database dbt elt etl etl-automation metabase metadata modern modern-dev modernization

Last synced: 28 Mar 2025

https://github.com/rifa8/capstone-project-with-dynamic-dag

The project focuses on creating an ELT pipeline to consolidate data from diverse resources into a single source of truth in BigQuery. The heart of this project is the innovative use of Apache Airflow to design a dynamic Directed Acyclic Graph (DAG) that automates task generation based on predefined file configurations.

dynamic-dag elt visualization

Last synced: 13 Apr 2026

https://github.com/dmarks84/coursework_project_banks-web-scraping-sql

Project for IBM Data Engineering & Python course on ETL & Big Data -- Scraped website data and made API calls for additional data; wrangled and transformed this data and loaded into a SQL database.

apis beautifulsoup databases elt etl nosql numpy pandas pipelines python sql sqlite web-scraping

Last synced: 10 Apr 2026

https://github.com/meltanolabs/tap-intacct

Singer tap for the Sage Intacct API

elt meltano singer-tap

Last synced: 11 Apr 2025

https://github.com/victor-antoniassi/coinmarketcap_api_to_duckdb

This repository demonstrates the use of the dlt (data load tool) library to extract cryptocurrency data from the CoinMarketCap API and load it into a DuckDB database.

api-rest coinmarketcap-api crypto cryptocurrencies data-engineering data-pipeline dlt dlthub elt etl python

Last synced: 01 Jun 2026

https://github.com/taquynhnga2001/proptech-dagster

Build an ELT pipeline with dagster and dbt to schedule loading HDB resale transactions in Singapore into Google BigQuery data warehouse, then create Power BI dashboard to enhance insight exploration.

bigquery dagster data-integration data-orchestration data-warehouse dbt elt etl powerbi python

Last synced: 14 Feb 2026

https://github.com/cloudquery/recipes

Real-world CloudQuery configuration examples

cloudquery elt

Last synced: 19 Mar 2026

https://github.com/souravroy-etl/duckle

Local-first ETL/ELT studio: a drag-and-drop visual pipeline designer that compiles to SQL and runs on DuckDB. Tiny desktop app, no servers, git-friendly workspaces.

data-engineering data-integration data-pipeline data-quality desktop-app drag-and-drop duckdb elt embedded etl local-first react rust sql tauri typescript vector-database

Last synced: 23 May 2026

https://github.com/ooemperor/go-db-etl

Tool for fetching data from mutliple sources and insert into a single target database with history

csv elt elt-pipeline etl go golang json mssql mysql postgres psql sql-server

Last synced: 10 Feb 2026

https://github.com/dmarks84/coursework_coursework_project_automobile-sales-visualization

Project for IBM Data Science course on Visualization & Dashboards -- Analyzed historical sales data, performing EDA and setting up an interactive dashboard

communication dash dashboards data-modeling elt etl folium matplotlib numpy pandas pipelines plotly python scipy seaborn visualization

Last synced: 10 Apr 2026

https://github.com/shsiddhant/cricket-warehouse

A data warehouse for ball-by-ball cricket match data, designed for analytics and modeling.

cricket-data dbt elt postgresql python

Last synced: 04 Apr 2026