Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with elt
A curated list of projects in awesome lists tagged with elt .
https://github.com/apache/airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
airflow apache apache-airflow automation dag data-engineering data-integration data-orchestrator data-pipelines data-science elt etl machine-learning mlops orchestration python scheduler workflow workflow-engine workflow-orchestration
Last synced: 01 Oct 2024
https://github.com/apache/incubator-airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
airflow apache apache-airflow automation dag data-engineering data-integration data-orchestrator data-pipelines data-science elt etl machine-learning mlops orchestration python scheduler workflow workflow-engine workflow-orchestration
Last synced: 05 Aug 2024
https://github.com/airbytehq/airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
bigquery change-data-capture data data-analysis data-collection data-engineering data-integration data-pipeline elt etl java mssql mysql pipeline postgresql python redshift s3 self-hosted snowflake
Last synced: 28 Sep 2024
https://github.com/fishtown-analytics/dbt
dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
analytics business-intelligence data-modeling dbt-viewpoint elt pypa slack
Last synced: 12 Sep 2024
https://github.com/dbt-labs/dbt-core
dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
analytics business-intelligence data-modeling dbt-viewpoint elt pypa slack
Last synced: 29 Sep 2024
https://github.com/apache/incubator-seatunnel
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
apache batch cdc change-data-capture data-ingestion data-integration elt high-performance offline real-time streaming
Last synced: 17 Aug 2024
https://github.com/apache/seatunnel
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
apache batch cdc change-data-capture data-ingestion data-integration elt high-performance offline real-time streaming
Last synced: 29 Sep 2024
https://github.com/mage-ai/mage-ai
🧙 Build, run, and manage data pipelines for integrating and transforming data.
artificial-intelligence data data-engineering data-integration data-pipelines data-science dbt elt etl machine-learning orchestration pipeline pipelines python reverse-etl spark sql transformation
Last synced: 28 Sep 2024
https://github.com/kestra-io/kestra
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
data data-engineering data-integration data-orchestration data-orchestrator data-pipeline data-quality elt etl low-code orchestration pipeline reverse-etl scheduler workflow workflow-engine
Last synced: 29 Sep 2024
https://github.com/cloudquery/cloudquery
The open source high performance ELT framework powered by Apache Arrow
airbyte attack-surface-management aws azure bigquery cspm data data-analysis data-collection data-engineering data-integration elt etl etl-framework gcp github-api go google kubernetes sql
Last synced: 27 Sep 2024
https://github.com/apache/flink-cdc
Flink CDC is a streaming data integration tool
batch cdc change-data-capture data-integration data-pipeline distributed elt etl flink kafka mysql paimon postgresql real-time schema-evolution
Last synced: 30 Sep 2024
https://github.com/ververica/flink-cdc-connectors
Flink CDC is a streaming data integration tool
batch cdc change-data-capture data-integration data-pipeline distributed elt etl flink kafka mysql paimon postgresql real-time schema-evolution
Last synced: 10 Aug 2024
https://github.com/dlt-hub/dlt
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
data data-engineering data-lake data-loading data-warehouse elt extract load python transform
Last synced: 31 Jul 2024
https://github.com/meltano/meltano
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
connectors data data-engineering data-pipelines dataops dataops-platform elt extract-data integration loaders meltano meltano-sdk open-source opensource pipelines singer tap taps target targets
Last synced: 01 Oct 2024
https://github.com/TobikoData/sqlmesh
Efficient data transformation and modeling framework that is backwards compatible with dbt.
dataengineering dataops dbt elt etl python sql transformation
Last synced: 31 Jul 2024
https://github.com/dataform-co/dataform
Dataform is a framework for managing SQL based data operations in BigQuery
analytics business-intelligence data-engineering data-pipelines elt etl hacktoberfest
Last synced: 29 Sep 2024
https://github.com/kuwala-io/kuwala
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times
admin-boundaries data data-integration data-science dbt elt google-trends jupyter kuwala no-code open-data open-source population postgres pyspark python react react-flow scraping spatial-analysis
Last synced: 01 Aug 2024
https://github.com/raystack/optimus
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
airflow analytics analytics-engineering automation bigquery business-intelligence data-modelling data-pipelines data-transformation data-warehouse dataops elt etl golang workflows
Last synced: 29 Sep 2024
https://github.com/artie-labs/transfer
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift) in real-time.
apache-kafka bigquery cdc change-data-capture data-integration data-pipelines database debezium elt golang kafka redshift snowflake
Last synced: 29 Sep 2024
https://github.com/quarylabs/quary
Transform data together. Model, test and deploy as a team.
analytics business-intelligence data-modeling elt
Last synced: 31 Jul 2024
https://github.com/Datavault-UK/automate-dv
A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
data-vault dataengineering datalake datavault datavault20 datawarehouse datawarehousing dbt elt etl metadata snowflake sql
Last synced: 03 Aug 2024
https://github.com/gouline/dbt-metabase
dbt + Metabase integration
analytics business-intelligence data data-modelling dbt elt metabase pypa python vizualisation
Last synced: 02 Aug 2024
https://github.com/vmware/versatile-data-kit
One framework to develop, deploy and operate data workflows with Python and SQL.
analytics data data-engineer data-engineering data-engineering-pipeline data-lineage data-pipelines data-science data-structures data-warehouse database dataops elt etl pipeline python snowflake sql trino warehouse
Last synced: 03 Aug 2024
https://github.com/astronomer/astro-sdk
Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
airflow apache-airflow bigquery dags data-analysis data-science elt etl gcs pandas postgres python s3 snowflake sql sqlite workflows
Last synced: 29 Sep 2024
https://github.com/cuebook/cuelake
Use SQL to build ELT pipelines on a data lakehouse.
apache-iceberg apache-spark data-engineering data-ingestion data-integration data-lake data-pipeline data-transfer datalake delta elt etl incremental-updates lakehouse pipelines spark-sql sql upsert zeppelin-notebook
Last synced: 28 Sep 2024
https://github.com/umitkaanusta/reddit-detective
Play detective on Reddit: Discover political disinformation campaigns, secret influencers and more
analysis analytics api data database elt etl graph graph-database neo4j network politics reddit social social-media social-network
Last synced: 29 Sep 2024
https://github.com/unytics/airbyte_serverless
Airbyte made simple (no UI, no database, no cluster)
airbyte bigquery data data-analysis data-engineering data-warehouse elt etl pipeline
Last synced: 29 Sep 2024
https://github.com/zsvoboda/dbd
dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
bigquery csv database database-schemas elt etl excel json mysql parquet postgresql python python3 redshift snowflake sql sqlite xls xlsx
Last synced: 28 Sep 2024
https://github.com/ascrus/getl
A tool for developing and testing ETL and ELT processes for automating the capture, delivery and processing of information in data warehouses on the MicroFocus Vertica platform.
csv dsl elt etl excel hdfs hive impala json kafka sql unit-testing vertica xml
Last synced: 01 Aug 2024
https://github.com/childmindresearch/bids2table
Efficiently index large-scale BIDS neuroimaging datasets and derivatives
arrow bids data-pipeline elt etl neuroimaging parquet
Last synced: 01 Aug 2024
https://github.com/shipyardapp/postgresql-blueprints
Simplified blueprints for building data pipelines with PostgreSQL.
cli data-analysis data-engineering data-pipeline data-science database elt etl postgres postgresql
Last synced: 13 Aug 2024
https://github.com/shipyardapp/amazonathena-blueprints
Simplified blueprints for building data pipelines with Amazon Athena.
amazon-athena athena cli data-analysis data-engineering data-science elt etl
Last synced: 13 Aug 2024
https://github.com/salvatoreamaddio/pipelinewebsite
This a console line application is an Ad-hoc Solution for a client who needed a way of extracting data from their own website and print them onto a spreadsheet.
csharp csharp-app csharp-code elt elt-pipeline excel-export web
Last synced: 28 Sep 2024
https://github.com/davidkhala/etl
Collection of data Extract, Transform, Load
apache-beam dbt elt etl fivetran
Last synced: 02 Oct 2024