https://github.com/localstack-samples/serverless-data-processing-pipeline

API Gateway -> Lambda -> Kinesis Stream -> Lambda -> DynamoDB -> DynamoDB stream -> Lambda -> Postgres
https://github.com/localstack-samples/serverless-data-processing-pipeline

Last synced: 9 months ago
JSON representation

API Gateway -> Lambda -> Kinesis Stream -> Lambda -> DynamoDB -> DynamoDB stream -> Lambda -> Postgres

Host: GitHub
URL: https://github.com/localstack-samples/serverless-data-processing-pipeline
Owner: localstack-samples
License: mit
Created: 2024-05-17T12:33:00.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-10-23T13:32:45.000Z (about 1 year ago)
Last Synced: 2025-03-24T18:06:30.748Z (9 months ago)
Language: Go
Size: 43 KB
Stars: 2
Watchers: 16
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # serverless-data-processing-pipeline

This is a sample CDK app that creates a *API Gateway -> Lambda -> Kinesis Stream -> Lambda -> DynamoDB -> DynamoDB stream -> Lambda -> CloudWatch Metrics* chain and then we benchmark the time it takes to complete this loop on a M1 max chip.

## Prerequisites

The following dependencies need to be available on your machine:

1. [Go](https://go.dev/doc/install).

1. [Localstack CLI](https://docs.localstack.cloud/getting-started/installation/).

1. [CDK CLI](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html).

1. [Watchman](https://facebook.github.io/watchman/docs/install).

1. [jq](https://jqlang.github.io/jq/download/).

1. [k6](https://k6.io/docs/get-started/installation/).

## Commands

 * `localstack start` start LocalStack with the Docker executor

 * `cdk bootstrap`                                    bootstrap cdk stack onto AWS/LocalStack

 * `cdk deploy`                                       deploy this stack to your default AWS account/region

 * `cdk diff`                                         compare deployed stack with current state

 * `cdk synth`                                        emits the synthesized CloudFormation template

 * `go test`                                          run unit tests

 * `watchman [upstream|midstream|downstream]`         watch and hot-reload lambda functions

 * `wait_requests `              wait until all requests are processed and export the latencies

## Configuration

* `USE_LOCALSTACK`   set to `true` if the stack is deployed to LocalStack

* `HOT_DEPLOY`       set to `true` if the hot-reloading feature is to be enabled

* `LAMBDA_DIST_PATH` directory where binaries for the hot-reloading feature are stored (optional)

* `LAMBDA_SRC_PATH`  directory where the src of the lambda functions is found

## Deploy

On LocalStack:

```bash

export PROVIDER_OVERRIDE_CLOUDWATCH=v1

export LAMBDA_EVENT_SOURCE_MAPPING=v2

localstack start -d

export USE_LOCALSTACK=true

export HOT_DEPLOY=true

cdklocal bootstrap

cdklocal deploy --require-approval=never

```

On AWS:

```

cdk bootstrap --profile aws

cdk deploy --require-approval=never --profile aws

```

## Sample Run

After deploying the stack, retrieve the method's endpoint by inspecting the CfnOutput outputs like in the following example:

```sh

localstack@macintosh serverless-data-processing-pipeline % USE_LOCALSTACK=true HOT_DEPLOY=true cdklocal deploy --require-approval=never                                                   

✨  Synthesis time: 3.72s

ServerlessDataProcessingPipelineStack:  start: Building dd5711540f04e06aa955d7f4862fc04e8cdea464cb590dae91ed2976bb78098e:current_account-current_region

ServerlessDataProcessingPipelineStack:  success: Built dd5711540f04e06aa955d7f4862fc04e8cdea464cb590dae91ed2976bb78098e:current_account-current_region

ServerlessDataProcessingPipelineStack:  start: Building 4c4836f6c768f4500c058ac6a02f2090830a58eb1a0e58d59a5c7ffadf208861:current_account-current_region

ServerlessDataProcessingPipelineStack:  success: Built 4c4836f6c768f4500c058ac6a02f2090830a58eb1a0e58d59a5c7ffadf208861:current_account-current_region

ServerlessDataProcessingPipelineStack:  start: Publishing dd5711540f04e06aa955d7f4862fc04e8cdea464cb590dae91ed2976bb78098e:current_account-current_region

ServerlessDataProcessingPipelineStack:  start: Publishing 4c4836f6c768f4500c058ac6a02f2090830a58eb1a0e58d59a5c7ffadf208861:current_account-current_region

ServerlessDataProcessingPipelineStack:  success: Published 4c4836f6c768f4500c058ac6a02f2090830a58eb1a0e58d59a5c7ffadf208861:current_account-current_region

ServerlessDataProcessingPipelineStack:  success: Published dd5711540f04e06aa955d7f4862fc04e8cdea464cb590dae91ed2976bb78098e:current_account-current_region

ServerlessDataProcessingPipelineStack: deploying... [1/1]

ServerlessDataProcessingPipelineStack: creating CloudFormation changeset...

 ✅  ServerlessDataProcessingPipelineStack

✨  Deployment time: 30.69s

Outputs:

ServerlessDataProcessingPipelineStack.ApiEndpoint4F160690 = https://tsyeuri986.execute-api.localhost.localstack.cloud:4566/prod/

ServerlessDataProcessingPipelineStack.ApiGatewayMethodEndpoint = https://tsyeuri986.execute-api.localhost.localstack.cloud:4566/prod/

ServerlessDataProcessingPipelineStack.DynamoDBTableName = ServerlessDataProcessingPipeline-DynamoDBTable59784FC0-072648f2

ServerlessDataProcessingPipelineStack.Environment = LocalStack

ServerlessDataProcessingPipelineStack.KinesisStreamName = KinesisStream

Stack ARN:

arn:aws:cloudformation:us-east-1:000000000000:stack/ServerlessDataProcessingPipelineStack/68a8d688

✨  Total time: 34.4s

localstack@macintosh serverless-data-processing-pipeline % export APIGW_ENDPOINT="https://tsyeuri986.execute-api.localhost.localstack.cloud:4566/prod/"

```

Followed by a sample request:

```sh

localstack@macintosh serverless-data-processing-pipeline % timestamp=$(awk 'BEGIN {srand(); print srand()}')

localstack@macintosh serverless-data-processing-pipeline % curl -XPOST -H "Content-Type: application/json" $APIGW_ENDPOINT -d "$(jq -n --arg ts "$timestamp" '{id: "1", message: "Hello World", timestamp: $ts | tonumber}')" -i

HTTP/2 200 

content-type: application/json

content-length: 21

date: Fri, 31 May 2024 18:07:54 GMT

server: hypercorn-h2

{"message":"success"}

```

## Stress Test

```sh

$ k6 run -e APIGW_ENDPOINT=$APIGW_ENDPOINT loadtest.js

          /\      |‾‾| /‾‾/   /‾‾/   

     /\  /  \     |  |/  /   /  /    

    /  \/    \    |     (   /   ‾‾\  

   /          \   |  |\  \ |  (‾)  | 

  / __________ \  |__| \__\ \_____/ .io

     execution: local

        script: loadtest.js

        output: -

     scenarios: (100.00%) 1 scenario, 10 max VUs, 1m30s max duration (incl. graceful stop):

              * default: 10 looping VUs for 1m0s (gracefulStop: 30s)

     ✓ status was 200

     ✓ transaction time OK

     checks.........................: 100.00% ✓ 3432      ✗ 0   

     data_received..................: 272 kB  4.5 kB/s

     data_sent......................: 235 kB  3.9 kB/s

     http_req_blocked...............: avg=484.25µs min=0s       med=1µs      max=87.94ms  p(90)=1µs      p(95)=1µs     

     http_req_connecting............: avg=4.97µs   min=0s       med=0s       max=940µs    p(90)=0s       p(95)=0s      

     http_req_duration..............: avg=350.92ms min=203.55ms med=339.86ms max=725.44ms p(90)=406.8ms  p(95)=488.96ms

       { expected_response:true }...: avg=350.92ms min=203.55ms med=339.86ms max=725.44ms p(90)=406.8ms  p(95)=488.96ms

     http_req_failed................: 0.00%   ✓ 0         ✗ 1658

     http_req_receiving.............: avg=41.04ms  min=31.15ms  med=40.83ms  max=55.17ms  p(90)=42.03ms  p(95)=42.94ms 

     http_req_sending...............: avg=64.99µs  min=12µs     med=42µs     max=2.06ms   p(90)=114.5µs  p(95)=150µs   

     http_req_tls_handshaking.......: avg=213.15µs min=0s       med=0s       max=41.6ms   p(90)=0s       p(95)=0s      

     http_req_waiting...............: avg=309.81ms min=162.9ms  med=298.6ms  max=689.61ms p(90)=366.07ms p(95)=441.45ms

     http_reqs......................: 1658    28.289592/s

     iteration_duration.............: avg=351.62ms min=225.77ms med=340.5ms  max=725.59ms p(90)=406.92ms p(95)=489.08ms

     iterations.....................: 1658    28.289592/s

     vus............................: 10      min=10      max=10

     vus_max........................: 10      min=10      max=10

running (1m00.7s), 00/10 VUs, 1658 complete and 0 interrupted iterations

default ✓ [======================================] 10 VUs  1m0s

```

And then let's wait until all requests have been processed by the `midstream` and `downstream` Lambda functions. Let's also save the timestamps that indicate how much time it took each request to flow through the entire pipeline.

```sh

$ ./wait_requests timestamps.json

Monitoring CloudWatch metrics for new datapoints...

No new datapoints added. Exiting.

Exporting CloudWatch metrics to timestamps.json...

```

## Compute Results

We need to see how much time it takes (based on percentiles, averages, etc) to run a request through the entire serverless pipeline while the large number of VUs (virtual users) hit it with never ending requests for an entire minute.

```python

import pandas as pd

# Load the data from the JSON file into a pandas DataFrame

data = pd.read_json('timestamps.json')

# Calculate the desired statistics

stats = data.describe(percentiles=[.90, .95, .99])

# Print the statistics

print(stats)

```

The output of that would be:

```text

count  93.000000

mean   40.905096

std    16.211245

min     6.285714

50%    47.187500

90%    56.971429

95%    57.614691

99%    58.549990

max    58.939394

```

The `count` param tells us the whole experiment ran for 93 seconds, but our `k6` test only ran for 60 seconds, so there was some backlogging that occurred. So given that there were `1658` total inbound requests, LocalStack managed to process about 17.8 requests/s. Or more specifically, the pipeline was able to run 17.8 times per second. And that's for 10 virtual users, so about 1.78 requests/s/VU.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/localstack-samples/serverless-data-processing-pipeline

Awesome Lists containing this project

README