awesome-data-engineering
A curated list of tools, frameworks, platforms, architectures, and learning resources for data engineering.
https://github.com/brandonhimpfen/awesome-data-engineering
Last synced: 2 days ago
JSON representation
-
Data Engineering on the Cloud
-
Data Ingestion & Integration
- Apache Kafka Connect
- Apache NiFi
- Airbyte - source data integration platform for ELT pipelines.
- Fivetran
- Singer - source standard for data extraction and loading.
- Debezium
-
Data Quality, Governance & Lineage
- Great Expectations
- Apache Atlas
- OpenLineage
- DataHub - source metadata and data catalog.
- Amundsen
-
Data Transformation & Modeling
- dbt - based transformation and analytics engineering tool.
- Apache Spark - scale data processing.
- Apache Beam
- Dask
- SQLMesh
-
Foundations & Concepts
- Data Engineering Explained
- Data Lake vs Data Warehouse
- Event-Driven Architecture - time data systems.
- Data Engineering Explained
- CAP Theorem - offs in distributed data systems.
-
Infrastructure & Platforms
-
Learning Resources
-
Guides
-
Tutorials
- Data Engineering Zoomcamp - on data engineering course.
- Apache Spark Documentation
-
-
NoSQL & Specialized Datastores
- MongoDB - oriented NoSQL database.
- Apache HBase
- Amazon DynamoDB - value store.
- Redis - memory data store for caching and streaming use cases.
-
Observability & Reliability
-
Query Engines & Analytics
- Trino
- Presto - performance distributed SQL engine.
- Spark SQL
- DuckDB - process analytical SQL engine.
- ClickHouse - oriented OLAP database.
-
Related Awesome Lists
-
Storage, Warehousing & Lakehouses
- Amazon S3
- Google Cloud Storage
- BigQuery
- Delta Lake - source storage layer enabling lakehouse architecture.
- Apache Iceberg - scale analytic datasets.
- Apache Hudi
- Snowflake - native data warehouse.
-
Streaming & Event Processing
- Apache Kafka
- Apache Pulsar - native pub/sub and streaming platform.
- Apache Flink - first processing framework with low latency.
- Kafka Streams
- Apache Storm - time computation system for stream processing.
-
Workflow Orchestration
- Apache Airflow
- Dagster
- Prefect
- Luigi
- Argo Workflows - native workflow engine.
- Argo Workflows - native workflow engine.
Programming Languages
Categories
Storage, Warehousing & Lakehouses
7
Data Ingestion & Integration
6
Workflow Orchestration
6
Related Awesome Lists
5
Foundations & Concepts
5
Learning Resources
5
Data Quality, Governance & Lineage
5
Streaming & Event Processing
5
Observability & Reliability
5
Query Engines & Analytics
5
Data Transformation & Modeling
5
Infrastructure & Platforms
4
NoSQL & Specialized Datastores
4
Data Engineering on the Cloud
3
License
1
Sub Categories