https://github.com/ev2900/mongodb_streams_glue_iceberg
Process DynamoDB change streams via. AWS Glue w Iceberg to keep a copy of a collection in S3 upto date
https://github.com/ev2900/mongodb_streams_glue_iceberg
apache-iceberg aws-glue glue mondodb mongodb-change-streams python
Last synced: 6 months ago
JSON representation
Process DynamoDB change streams via. AWS Glue w Iceberg to keep a copy of a collection in S3 upto date
- Host: GitHub
- URL: https://github.com/ev2900/mongodb_streams_glue_iceberg
- Owner: ev2900
- Created: 2023-05-24T13:40:54.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2025-03-30T01:10:07.000Z (6 months ago)
- Last Synced: 2025-04-10T03:13:44.061Z (6 months ago)
- Topics: apache-iceberg, aws-glue, glue, mondodb, mongodb-change-streams, python
- Language: Python
- Homepage:
- Size: 27.3 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# MongoDB Streams AWS Glue
![]()
This repository has a solution for a 1 time copy of data from a MongoDB collection into an Apache Iceberg table in S3 and a solution to use the MongoDB change stream to keep the iceberg copy of the data up to date.
The architecture below depicts the solution
The repository is broken down into several sections. Each section has its own read me that will explains its components
* [1_Sample_MongoDB_Data](https://github.com/ev2900/MongoDB_Streams_Glue_Iceberg/tree/main/1_Sample_MongoDB_Data) steps 1 and 2 in architecture diagram.
* [2_Glue_Iceberg_Initial_Load](https://github.com/ev2900/MongoDB_Streams_Glue_Iceberg/tree/main/2_Glue_Iceberg_Initial_Load) step 3 in the architecture diagram
* [3_Sample_MongoDB_Change_Stream_Data](https://github.com/ev2900/MongoDB_Streams_Glue_Iceberg/tree/main/2_Glue_Iceberg_Initial_Load) steps 4 and 5 in the architecture diagram
* [4_Glue_Iceberg_Change_Stream](https://github.com/ev2900/MongoDB_Streams_Glue_Iceberg/tree/main/2_Glue_Iceberg_Initial_Load) step 6 in the architecture diagram