https://github.com/gakas14/dynamodb-stream-data-capture

Using DynamoDb Stream to capture item-level changes in the DynamoDb table, then use kinesis data stream and kinesis firehose to save the changes into a s3 bucket.
https://github.com/gakas14/dynamodb-stream-data-capture

dynamodb dynamodb-streams kinesis kinesis-data-streams kinesis-firehose lambda-functions s3-bucket

Last synced: 8 months ago
JSON representation

Using DynamoDb Stream to capture item-level changes in the DynamoDb table, then use kinesis data stream and kinesis firehose to save the changes into a s3 bucket.

Host: GitHub
URL: https://github.com/gakas14/dynamodb-stream-data-capture
Owner: gakas14
Created: 2024-01-27T05:21:37.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-01-27T07:38:11.000Z (over 1 year ago)
Last Synced: 2024-12-31T12:32:46.139Z (9 months ago)
Topics: dynamodb, dynamodb-streams, kinesis, kinesis-data-streams, kinesis-firehose, lambda-functions, s3-bucket
Homepage:
Size: 6.84 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Streaming Amazon DynamoDB data into a centralized data lake (S3)

DynamoDB Stream captures item-level changes in the DynamoDb table. Then, Kinesis Data Stream and Firehose save the changes to an S3 bucket. A lambda function transforms the data before dumping it into S3.

![dynamodb_Stream](https://github.com/gakas14/DynamoDb-Stream-Data-Capture/assets/74584964/b5d05783-059c-4ab4-b30d-aba9f430c6e2)

## Step 1: Create a DynamoDB table. 

Create a customer table with a customer_id as the primary key,



 

## Step 2: Create a kinesis data stream 



## Step 3: Create a lambda function and a s3 bucket 



#### Lambda function script

```

          # This function adds a new line in between each record coming from Kenesis data stream 

          import json

          import boto3

          import base64

          output = []

          

          def lambda_handler(event, context):

              print(event)

              for record in event['records']:

                  payload = base64.b64decode(record['data']).decode('utf-8')

                  print('payload:', payload)

                  

                  row_w_newline = payload + "\n"

                  print('row_w_newline type:', type(row_w_newline))

                  row_w_newline = base64.b64encode(row_w_newline.encode('utf-8'))

                  

                  output_record = {

                      'recordId': record['recordId'],

                      'result': 'Ok',

                      'data': row_w_newline

                  }

                  output.append(output_record)

          

              print('Processed {} records.'.format(len(event['records'])))

              

              return {'records': output}

```

 

## Step 4: Create the kinesis firehouse 



#### With source as kinesis data stream and destination as s3 

#### Add Transform source records with AWS Lambda. 

#### Add the s3 bucket 

#### Configure the Buffer size and interval to dump the data only if the size is one MB or in 60 seconds. 



## Step 5: setup the DynamoDB stream with the kinesis  

##### Turn on the Amazon Kinesis data stream from the DynamoDB table. 



## Step 6: insert data into the table  



#### SQL Queriew for DynamoDB

```

INSERT INTO "customers" value {'customers_id':1, 'name':'Baba Li', 'age':20,'gender':'M'}

INSERT INTO "customers" value {'customers_id':2, 'name':'Lucky Bill', 'age':24,'gender':'M'}

INSERT INTO "customers" value {'customers_id':3, 'name':'Mom Ma', 'age':50,'gender':'F'}

INSERT INTO "customers" value {'customers_id':4, 'name':'Locker Su', 'age':30,'gender':'M'}

INSERT INTO "customers" value {'customers_id':5, 'name':'Abdel ly', 'age':41,'gender':'F'}

INSERT INTO "customers" value {'customers_id':6, 'name':'Abou Sar', 'age':35,'gender':'F'}

update customers set age=26 where customers_id=3

select * from customers;

```



## Step 7: Check the data in the s3 bucket

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/gakas14/dynamodb-stream-data-capture

Awesome Lists containing this project

README