Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-iceberg
A curated list of iceberg and surrounding ecosystem
https://github.com/zriyansh/awesome-iceberg
Last synced: 2 days ago
JSON representation
-
📂 Additional Sections
-
2. Tutorials and Learning Resources
- Apache Iceberg Documentation
- Coursera: Data Engineering on Google Cloud
- LinkedIn Learning: Modern Data Warehousing with Apache Iceberg
- Udemy: Apache Iceberg Essentials
- Pluralsight: Advanced Data Engineering with Apache Iceberg
- Getting Started with Apache Iceberg
- Towards Data Science - Apache Iceberg Articles
- Iceberg vs. Delta Lake vs. Hudi
- Towards Data Science - Apache Iceberg Articles
- Databricks Blog on Iceberg
- YouTube - Apache Iceberg Tutorials
- edureka! Data Engineering Tutorials
- LinkedIn Learning: Modern Data Warehousing with Apache Iceberg
- Simplilearn - Iceberg Tutorials
- Udemy: Apache Iceberg Essentials
- Getting Started with Apache Iceberg
- Iceberg vs. Delta Lake vs. Hudi
- Towards Data Science - Apache Iceberg Articles
- Databricks Blog on Iceberg
- YouTube - Apache Iceberg Tutorials
- Confluent YouTube Channel
- Simplilearn - Iceberg Tutorials
- Pluralsight: Advanced Data Engineering with Apache Iceberg
-
3. Open-source Projects
- Apache Iceberg - Core project repository.
- Apache Iceberg - Core project repository.
- lakeFS - Git-like version control for data lakes.
- Great Expectations - Data validation framework.
- Great Expectations - Data validation framework.
- Delta Sharing - Open protocol for secure data sharing.
- Airbyte - Data integration platform.
- Amundsen - Data discovery and metadata engine.
- DataHub - Metadata platform for the modern data stack.
- lakeFS - Git-like version control for data lakes.
- Materialize - Streaming SQL database for real-time analytics.
- Airbyte - Data integration platform.
- DataHub - Metadata platform for the modern data stack.
-
-
🔑 Core Sections
-
1. Iceberg Fundamentals
- Introducing Apache Iceberg - Official Blog Post
- Iceberg: Table Format for Large Analytics Datasets - SlideShare Presentation
- Academic Papers on Iceberg
- Iceberg: Table Format for Large Analytics Datasets - SlideShare Presentation
- Introducing Apache Iceberg - Official Blog Post
-
2. Key Iceberg Technologies
- Apache Iceberg - Core table format for managing large datasets.
- Delta Lake - Complementary storage layer providing ACID transactions.
- Delta Lake - Complementary storage layer providing ACID transactions.
- Apache Hudi - Another storage layer option with unique features.
- Apache Spark - Unified analytics engine for large-scale data processing.
- Trino - High-performance distributed SQL query engine.
- Hive - Data warehouse software for querying and managing large datasets.
- Hive Metastore - Central repository for metadata.
- Amundsen - Data discovery and metadata engine.
- DataHub - Metadata platform for the modern data stack.
- Apache Atlas - Governance and metadata framework.
- Trino - High-performance distributed SQL query engine.
- Presto - Distributed SQL query engine for big data.
- Hive - Data warehouse software for querying and managing large datasets.
- Amundsen - Data discovery and metadata engine.
- DataHub - Metadata platform for the modern data stack.
- Apache Atlas - Governance and metadata framework.
- Presto - Distributed SQL query engine for big data.
-
5. BI and Analytics on Iceberg
-
3. ETL/ELT Tools for Iceberg
- Spark Streaming - Scalable stream processing.
- Airbyte - Open-source data integration platform.
- Meltano - Open-source data integration tool built on Singer.
- dbt (data build tool) - Transform data in your warehouse more effectively.
- Matillion - Data integration and transformation tool.
- Apache Kafka - Distributed event streaming platform.
- Apache Flink - Stream processing framework.
- Airbyte - Open-source data integration platform.
- dbt (data build tool) - Transform data in your warehouse more effectively.
- Matillion - Data integration and transformation tool.
- Apache Kafka - Distributed event streaming platform.
- Apache Flink - Stream processing framework.
- Spark Streaming - Scalable stream processing.
-
4. Data Orchestration
- Dagster - Data orchestrator for machine learning, analytics, and ETL.
- Prefect - Workflow management system.
- Apache Airflow - Platform to programmatically author, schedule, and monitor workflows.
- Dagster - Data orchestrator for machine learning, analytics, and ETL.
- Prefect - Workflow management system.
- Apache Airflow - Platform to programmatically author, schedule, and monitor workflows.
-
6. ML and AI Workflows
- Databricks Machine Learning - Unified environment for ML.
- MLflow - Open-source platform for managing the ML lifecycle.
- Databricks Machine Learning - Unified environment for ML.
- MLflow - Open-source platform for managing the ML lifecycle.
-
7. Monitoring and Observability
- Great Expectations - Data validation framework.
- OpenTelemetry - Observability framework for cloud-native software.
- Prometheus - Monitoring system and time series database.
- Grafana - Open-source analytics and monitoring platform.
- OpenTelemetry - Observability framework for cloud-native software.
- Prometheus - Monitoring system and time series database.
- Grafana - Open-source analytics and monitoring platform.
-
8. Cost Optimization
- DuckDB - An in-process SQL OLAP Database Management System.
- ClickHouse - Fast open-source column-oriented database management system.
- Materialize - Streaming database for real-time applications.
- DuckDB - An in-process SQL OLAP Database Management System.
- ClickHouse - Fast open-source column-oriented database management system.
- Materialize - Streaming database for real-time applications.
-
-
🚀 Introduction
- Apache Iceberg - performance data management for your data lake.
-
🌟 Influential Personalities
-
**1. Ryan Blue**
-
**2. Benjamin Haimowitz**
-
**3. Paige Nord**
-
**4. Robin Moffatt**
-
**5. James Nestor**
-
**6. Maxime Beauchemin**
-
-
🤝 Contributing
-
Contribution Guidelines
-
-
📢 Acknowledgements
-
Contribution Guidelines
-
Categories
Sub Categories
2. Tutorials and Learning Resources
23
2. Key Iceberg Technologies
18
3. Open-source Projects
13
3. ETL/ELT Tools for Iceberg
13
7. Monitoring and Observability
7
**3. Paige Nord**
6
**2. Benjamin Haimowitz**
6
**5. James Nestor**
6
4. Data Orchestration
6
5. BI and Analytics on Iceberg
6
8. Cost Optimization
6
1. Iceberg Fundamentals
5
**1. Ryan Blue**
5
Contribution Guidelines
5
**6. Maxime Beauchemin**
5
**4. Robin Moffatt**
5
6. ML and AI Workflows
4
Keywords
data-engineering
6
pipeline
4
data-quality
4
data-discovery
3
metadata
3
data-catalog
3
postgresql
3
git-for-data
2
go
2
datalakes
2
bigquery
2
golang
2
google-cloud-storage
2
pipeline-tests
2
pipeline-testing
2
pipeline-debt
2
change-data-capture
2
data
2
data-analysis
2
data-collection
2
hadoop-filesystem
2
lakefs
2
object-storage
2
cleandata
2
data-profilers
2
data-profiling
2
data-science
2
data-unit-tests
2
datacleaner
2
datacleaning
2
dataquality
2
dataunittest
2
eda
2
exploratory-analysis
2
exploratory-data-analysis
2
exploratorydataanalysis
2
apache
2
iceberg
2
apache-spark
2
apache-sparksql
2
aws-s3
2
azure-blob-storage
2
azure-storage
2
data-lake
2
data-version-control
2
unicorns
2
resources
2
lists
2
awesome-list
2
awesome
2