https://github.com/gakas14/dynamodb-stream-data-capture
Using DynamoDb Stream to capture item-level changes in the DynamoDb table, then use kinesis data stream and kinesis firehose to save the changes into a s3 bucket.
https://github.com/gakas14/dynamodb-stream-data-capture
dynamodb dynamodb-streams kinesis kinesis-data-streams kinesis-firehose lambda-functions s3-bucket
Last synced: 8 months ago
JSON representation
Using DynamoDb Stream to capture item-level changes in the DynamoDb table, then use kinesis data stream and kinesis firehose to save the changes into a s3 bucket.
- Host: GitHub
- URL: https://github.com/gakas14/dynamodb-stream-data-capture
- Owner: gakas14
- Created: 2024-01-27T05:21:37.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-01-27T07:38:11.000Z (over 1 year ago)
- Last Synced: 2024-12-31T12:32:46.139Z (9 months ago)
- Topics: dynamodb, dynamodb-streams, kinesis, kinesis-data-streams, kinesis-firehose, lambda-functions, s3-bucket
- Homepage:
- Size: 6.84 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Streaming Amazon DynamoDB data into a centralized data lake (S3)
DynamoDB Stream captures item-level changes in the DynamoDb table. Then, Kinesis Data Stream and Firehose save the changes to an S3 bucket. A lambda function transforms the data before dumping it into S3.
## Step 1: Create a DynamoDB table.
Create a customer table with a customer_id as the primary key,
## Step 2: Create a kinesis data stream
## Step 3: Create a lambda function and a s3 bucket
#### Lambda function script
```
# This function adds a new line in between each record coming from Kenesis data stream
import json
import boto3
import base64
output = []
def lambda_handler(event, context):
print(event)
for record in event['records']:
payload = base64.b64decode(record['data']).decode('utf-8')
print('payload:', payload)
row_w_newline = payload + "\n"
print('row_w_newline type:', type(row_w_newline))
row_w_newline = base64.b64encode(row_w_newline.encode('utf-8'))
output_record = {
'recordId': record['recordId'],
'result': 'Ok',
'data': row_w_newline
}
output.append(output_record)
print('Processed {} records.'.format(len(event['records'])))
return {'records': output}```
## Step 4: Create the kinesis firehouse
#### With source as kinesis data stream and destination as s3
#### Add Transform source records with AWS Lambda.
#### Add the s3 bucket
#### Configure the Buffer size and interval to dump the data only if the size is one MB or in 60 seconds.
## Step 5: setup the DynamoDB stream with the kinesis
##### Turn on the Amazon Kinesis data stream from the DynamoDB table.
## Step 6: insert data into the table
#### SQL Queriew for DynamoDB
```
INSERT INTO "customers" value {'customers_id':1, 'name':'Baba Li', 'age':20,'gender':'M'}
INSERT INTO "customers" value {'customers_id':2, 'name':'Lucky Bill', 'age':24,'gender':'M'}
INSERT INTO "customers" value {'customers_id':3, 'name':'Mom Ma', 'age':50,'gender':'F'}
INSERT INTO "customers" value {'customers_id':4, 'name':'Locker Su', 'age':30,'gender':'M'}
INSERT INTO "customers" value {'customers_id':5, 'name':'Abdel ly', 'age':41,'gender':'F'}
INSERT INTO "customers" value {'customers_id':6, 'name':'Abou Sar', 'age':35,'gender':'F'}update customers set age=26 where customers_id=3
select * from customers;
```
## Step 7: Check the data in the s3 bucket
![]()