Projects in Awesome Lists tagged with apache-iceberg
A curated list of projects in awesome lists tagged with apache-iceberg .
https://github.com/matanolabs/matano
Open source security data lake for threat hunting, detection & response, and cybersecurity analytics at petabyte scale on AWS
alerting apache-iceberg aws aws-security big-data cloud cloud-native cloud-security cybersecurity detection-engineering dfir log-analytics log-management rust secops security security-tools serverless siem threat-hunting
Last synced: 14 May 2025
https://github.com/apache/incubator-xtable
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
apache-hudi apache-iceberg delta-lake
Last synced: 13 Apr 2025
https://github.com/cuebook/cuelake
Use SQL to build ELT pipelines on a data lakehouse.
apache-iceberg apache-spark data-engineering data-ingestion data-integration data-lake data-pipeline data-transfer datalake delta elt etl incremental-updates lakehouse pipelines spark-sql sql upsert zeppelin-notebook
Last synced: 07 Apr 2025
https://github.com/dominikhei/Local-Data-LakeHouse
Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testing.
apache-iceberg data-lake data-lakehouse hive-metastore lakehouse minio trino
Last synced: 07 May 2026
https://github.com/lhbench/lhbench
Lakehouse storage system benchmark
apache-hudi apache-iceberg benchmark cidr database databricks delta-lake lakehouse
Last synced: 04 Apr 2025
https://github.com/dacort/modern-data-lake-storage-layers
Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work
amazon-emr apache-hudi apache-iceberg aws delta-lake hudi iceberg
Last synced: 31 Oct 2025
https://github.com/abeltavares/real-time-data-pipeline
📡 Real-time data pipeline with Kafka, Flink, Iceberg, Trino, MinIO, and Superset. Ideal for learning data systems.
apache-flink apache-iceberg apache-kafka apache-superset aws big-data data-engineering data-pipeline data-visualization docker etl lakehouse minio open-source real-time-data s3 sql-analytics streaming-analytics trino
Last synced: 13 Oct 2025
https://github.com/aws-solutions-library-samples/guidance-for-developing-data-and-ai-foundation-with-amazon-sagemaker
DAIVI is a reference solution with IAC modules to accelerate development of Data, Analytics, AI and Visualization applications on AWS using the next generation Amazon SageMaker Unified Studio. The goal of the DAIVI solution is to provide engineers with sample infrastructure-as-code modules and application modules to build their data platforms.
apache-iceberg sagemaker sagemaker-studio terraform
Last synced: 31 May 2026
https://github.com/guidok91/spark-movies-etl
Spark data pipeline that processes movie ratings data.
apache-iceberg data-engineering data-pipeline elt etl pyspark spark uv
Last synced: 11 Mar 2026
https://github.com/datazip-inc/olake-ui
Frontend & BFF (Backend for frontend) for Olake. This includes the UI code and backend code for storing the configuration of sync and orchestrating it.
apache-iceberg change-data-capture data-engineering database elt elt-pipeline etl etl-pipeline hacktoberfest ui
Last synced: 23 Apr 2026
https://github.com/bodo-ai/denali
An open-source, community-driven REST catalog for Apache Iceberg!
apache-iceberg catalog go golang iceberg
Last synced: 06 Jul 2025
https://github.com/aws-samples/monitoring-apache-iceberg-table-metadata-layer
Sample code to collect Apache Iceberg metrics for table monitoring
apache-iceberg apache-spark aws aws-cloudwatch aws-glue aws-lambda data-quality monitoring pyiceberg sam-cli
Last synced: 29 Oct 2025
https://github.com/aws-samples/iceberg-streaming-examples
This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. These examples cover IoT and CDC scenarios using best practices. The code can be deployed into any Spark compatible engine like Amazon EMR Serverless or AWS Glue. A fully local developer environment is also provided.
apache-iceberg apache-spark structured-streaming
Last synced: 29 Oct 2025
https://github.com/guidok91/spark-structured-streaming-kafka
Spark Structured Streaming data pipeline that processes movie ratings data in real-time.
apache-iceberg apache-kafka apache-spark data-engineering etl kafka pyspark real-time spark spark-structured-streaming streaming
Last synced: 11 Mar 2026
https://github.com/gordonmurray/apache_flink_and_iceberg
Using Apache Flink to write to s3 in Apache Iceberg format
apache-flink apache-iceberg parquet s3
Last synced: 12 Apr 2025
https://github.com/jesufemi-o/iceberg-integration-framework
A poc open framework to manage data ingestion into apache iceberg tables
apache-iceberg lakehouse-platform pyiceberg
Last synced: 06 Mar 2026
https://github.com/sidequery/dlt-iceberg
An Iceberg destination for DLT that supports REST catalogs
apache-iceberg data-engineering datalake dlt dlthub etl iceberg
Last synced: 09 Feb 2026
https://github.com/bahbosque/delta-to-iceberg-aws-glue
Tool to migrate Delta Lake tables to Apache Iceberg using AWS Glue and S3
apache-iceberg aws aws-glue-data-catalog data-lake delta-lake migration-tool open-source spark
Last synced: 03 Jul 2025
https://github.com/abeltavares/versioned-data-lakehouse
🌊 Git-like Version Control for Data with Nessie, Iceberg, and Spark
apache-iceberg apache-nessie apache-spark atomic-etl block-storage branch-based-development data-engineering data-lakehouse data-pipelines data-versioning dataops distributed-systems etl etl-pipeline git-for-data minio s3 spark-etl table-format time-travel
Last synced: 20 May 2026
https://github.com/ev2900/iceberg_emr_athena
Resources from an virtual tech talk / workshop - Set Up and Use Apache Iceberg Tables on Your Data Lake
apache-iceberg athena aws emr spark
Last synced: 01 Aug 2025
https://github.com/ev2900/iceberg_update_metadata_script
Python script that will update S3 file paths in Iceberg metadata files (metadata.json + AVRO)
apache-iceberg aws aws-glue glue iceberg python
Last synced: 13 Apr 2025
https://github.com/joewood/react-iceberg
React Components to visualize Apache Iceberg tables
apache-arrow apache-iceberg apache-spark avro devcontainer docker-compose minio reactjs s3
Last synced: 11 Apr 2026
https://github.com/hussein-awala/gdpr-compliant-lakehouse
This repository is a demonstration of how to handle GDPR export and delete requests in an Iceberg Lakehouse to make it GDPR-compliant.
apache-iceberg apache-spark datalake gdpr lakehouse
Last synced: 18 May 2026
https://github.com/j3-signalroom/apache_flink-kickstarter
Examples of Apache Flink® applications showcasing the DataStream API and Table API in Java and Python, featuring AWS, GitHub, Terraform, and Apache Iceberg.
apache-flink apache-iceberg aws-glue aws-parameter-store aws-s3 aws-secrets-manager flink flink-examples flink-kafka flink-stream-processing github-actions iceberg snowflake streamlit-dashboard terraform-cloud
Last synced: 16 Mar 2025
https://github.com/kameshsampath/polaris-spark-devbox
A development environment for experimenting with Apache Polaris and Iceberg
apache-iceberg apache-polaris apache-spark jupyter-notebooks
Last synced: 19 May 2026
https://github.com/ev2900/mongodb_streams_glue_iceberg
Process DynamoDB change streams via. AWS Glue w Iceberg to keep a copy of a collection in S3 upto date
apache-iceberg aws-glue glue mondodb mongodb-change-streams python
Last synced: 15 Oct 2025
https://github.com/riju18/apache-iceberg-kickstart
apache-iceberg datalake datalakehouse docker dremio minio nessie pysaprk python3 s3 sql zeppelin
Last synced: 27 Apr 2026
https://github.com/ev2900/emr_studio_iceberg
Apache Icebery examples designed to be run on AWS Elastic Map Reduce (EMR) via. EMR Studio or EMR Notebooks
apache-iceberg aws elastic-map-reduce emr iceberg
Last synced: 02 May 2026
https://github.com/ev2900/iceberg_glue_register_table
Example using the Iceberg register_table command with AWS Glue and Glue Data Catalog
apache-iceberg aws aws-glue aws-glue-data-catalog glue iceberg
Last synced: 04 May 2026
https://github.com/j3-signalroom/supercharge_streamlit-apache_flink
Engaging, interactive visualizations crafted with Streamlit, seamlessly powered by Apache Flink in batch mode to reveal deep insights from data.
apache-flink apache-iceberg aws-glue-data-catalog flink flink-sql iceberg kafka pyflink streamlit streamlit-dashboard
Last synced: 22 May 2026
https://github.com/yuhexiong/deploy-spark-iceberg-rest-minio-guide
apache-iceberg apache-spark iceberg-rest minio
Last synced: 15 Apr 2025
https://github.com/j3-signalroom/linux_flink_with_iceberg
Apache Flink Docker image with Apache Iceberg support for Linux (i.e., non-Mac M chip).
apache-flink apache-iceberg flink iceberg
Last synced: 18 Mar 2026
https://github.com/ac-gomes/spark-iceberg-hive
apache-iceberg apache-spark hive-metastore iceberg minio spark trino
Last synced: 07 Apr 2026
https://github.com/johnymontana/hands-on-havasu-geoparquet
Notebook to accompany the "Hands-On With Havasu & GeoParquet" livestream
apache-iceberg apache-sedona geoparquet parquet sedonadb
Last synced: 12 Oct 2025
https://github.com/ivanyu/icebreaker
A GUI for Apache Iceberg REST Catalog
apache-iceberg gui iceberg swing
Last synced: 05 Apr 2025
https://github.com/marcinthecloud/iceberg.rest
An Apache Iceberg REST Catalog explorer - view namespaces, tables, stats, metadata, schema evolution, and more.
apache-iceberg claude-code cloudflare cloudflare-workers iceberg iceberg-rest
Last synced: 01 May 2026
https://github.com/jiatangzhi/master_thesis
This project implements my master’s thesis on building a scalable, ACID-compliant data lakehouse architecture for IoT and industrial workloads, integrating AWS Glue, S3, Athena, and Grafana with Iceberg to evaluate Copy-on-Write vs Merge-on-Read performance.
apache-iceberg aws-glue aws-s3 batch-processing data-engineering data-lakehouse distributed-systems grafana iot-data mqtt open-table-format python3 schema-evolution spark
Last synced: 04 May 2026
https://github.com/dmschauer/wap-pattern-iceberg-pyspark-aws-glue
About This repository shows how to implement the Write-Audit-Publish (WAP) pattern using Apache Spark and Apache Iceberg. It's aimed at Data Engineers who want to get started quickly.
apache-iceberg apache-spark aws aws-glue iceberg pyspark spark
Last synced: 08 Feb 2026
https://github.com/theades/serverless-data-lakehouse
This is an example project how to build a serverless data lakehouse on AWS using Terraform, Apache Iceberg and Spark.
apache-iceberg apache-spark aws data-engineering data-lakehouse terraform
Last synced: 09 Feb 2026
https://github.com/j3-signalroom/mac_flink_with_iceberg
Apache Flink Docker image with Apache Iceberg support for Mac M2, M3, or M4 chips.
apache-flink apache-iceberg flink iceberg
Last synced: 18 Mar 2026
https://github.com/jaehyeon-kim/open-dataml-stack
A curated collection of open source technologies and an accompanying CLI for experimenting with modern data architecture and MLOps.
apache-airflow apache-flink apache-iceberg apache-kafka apache-spark cli clickhouse data-engineering data-infrastructure data-lakehouse docker-compose mlflow mlops modern-data-stack openlineage openmetadata prometheus python stream-processing trino
Last synced: 05 Jun 2026
https://github.com/ardnaile/trabalho-eng-dados
Implementação de Apache Spark com Delta Lake e Apache Iceberg
apache-iceberg apache-spark delta-lake docker
Last synced: 06 May 2026
https://github.com/dmschauer/wap-pattern-pyspark-aws-glue
This repository shows how to implement the Write-Audit-Publish (WAP) pattern using Apache Spark and Apache Iceberg. It's aimed at Data Engineers who want to get started quickly.
apache-iceberg apache-spark aws aws-glue iceberg pyspark python
Last synced: 04 Feb 2026
https://github.com/dgroomes/iceberg-playground
📚 Learning and exploring Apache Iceberg
Last synced: 07 Feb 2026
https://github.com/subhamay-bhattacharyya/snowflake-de-azure-iceberg-tables
Terraform-based reference implementation for building Snowflake Data Engineering workloads on Azure using external Iceberg tables, Azure Blob Storage, and modular CI/CD pipelines with GitHub Actions.
adls-gen2 apache-iceberg azure ci-cd data-engineering github-actions infrastructure-as-code lakehouse snowflake terraform
Last synced: 02 May 2026
https://github.com/conduitio-labs/conduit-connector-s3-iceberg
apache-iceberg conduit iceberg java s3
Last synced: 23 Jun 2025
https://github.com/j3-signalroom/j3-techstack-lexicon
J3's techStack Lexicon.
apache-flink apache-iceberg flink iceberg terraform terraform-cloud
Last synced: 08 Jan 2026