{"id":18573131,"url":"https://github.com/localstack-samples/serverless-data-processing-pipeline","last_synced_at":"2026-02-02T08:04:08.763Z","repository":{"id":244442410,"uuid":"802052737","full_name":"localstack-samples/serverless-data-processing-pipeline","owner":"localstack-samples","description":"API Gateway -\u003e Lambda -\u003e Kinesis Stream -\u003e Lambda -\u003e DynamoDB -\u003e DynamoDB stream -\u003e Lambda -\u003e Postgres","archived":false,"fork":false,"pushed_at":"2024-10-23T13:32:45.000Z","size":44,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":16,"default_branch":"main","last_synced_at":"2025-04-10T20:46:23.042Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/localstack-samples.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-17T12:33:00.000Z","updated_at":"2024-10-23T13:32:49.000Z","dependencies_parsed_at":"2024-06-14T19:02:51.883Z","dependency_job_id":"f7fb6fa4-d56d-4df2-81ed-c4da69266b48","html_url":"https://github.com/localstack-samples/serverless-data-processing-pipeline","commit_stats":null,"previous_names":["localstack-samples/serverless-data-processing-pipeline"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/localstack-samples/serverless-data-processing-pipeline","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/localstack-samples%2Fserverless-data-processing-pipeline","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/localstack-samples%2Fserverless-data-processing-pipeline/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/localstack-samples%2Fserverless-data-processing-pipeline/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/localstack-samples%2Fserverless-data-processing-pipeline/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/localstack-samples","download_url":"https://codeload.github.com/localstack-samples/serverless-data-processing-pipeline/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/localstack-samples%2Fserverless-data-processing-pipeline/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29007387,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-02T06:37:10.400Z","status":"ssl_error","status_checked_at":"2026-02-02T06:37:09.383Z","response_time":58,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T23:08:07.405Z","updated_at":"2026-02-02T08:04:08.747Z","avatar_url":"https://github.com/localstack-samples.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# serverless-data-processing-pipeline\n\nThis is a sample CDK app that creates a *API Gateway -\u003e Lambda -\u003e Kinesis Stream -\u003e Lambda -\u003e DynamoDB -\u003e DynamoDB stream -\u003e Lambda -\u003e CloudWatch Metrics* chain and then we benchmark the time it takes to complete this loop on a M1 max chip.\n\n## Prerequisites\n\nThe following dependencies need to be available on your machine:\n\n1. [Go](https://go.dev/doc/install).\n\n1. [Localstack CLI](https://docs.localstack.cloud/getting-started/installation/).\n\n1. [CDK CLI](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html).\n\n1. [Watchman](https://facebook.github.io/watchman/docs/install).\n\n1. [jq](https://jqlang.github.io/jq/download/).\n\n1. [k6](https://k6.io/docs/get-started/installation/).\n\n## Commands\n\n * `localstack start` start LocalStack with the Docker executor\n * `cdk bootstrap`                                    bootstrap cdk stack onto AWS/LocalStack\n * `cdk deploy`                                       deploy this stack to your default AWS account/region\n * `cdk diff`                                         compare deployed stack with current state\n * `cdk synth`                                        emits the synthesized CloudFormation template\n * `go test`                                          run unit tests\n * `watchman [upstream|midstream|downstream]`         watch and hot-reload lambda functions\n * `wait_requests \u003clatencies_file.json\u003e`              wait until all requests are processed and export the latencies\n\n## Configuration\n\n* `USE_LOCALSTACK`   set to `true` if the stack is deployed to LocalStack\n* `HOT_DEPLOY`       set to `true` if the hot-reloading feature is to be enabled\n* `LAMBDA_DIST_PATH` directory where binaries for the hot-reloading feature are stored (optional)\n* `LAMBDA_SRC_PATH`  directory where the src of the lambda functions is found\n\n## Deploy\n\nOn LocalStack:\n\n```bash\nexport PROVIDER_OVERRIDE_CLOUDWATCH=v1\nexport LAMBDA_EVENT_SOURCE_MAPPING=v2\nlocalstack start -d\n\nexport USE_LOCALSTACK=true\nexport HOT_DEPLOY=true\ncdklocal bootstrap\ncdklocal deploy --require-approval=never\n```\n\nOn AWS:\n```\ncdk bootstrap --profile aws\ncdk deploy --require-approval=never --profile aws\n```\n\n## Sample Run\n\nAfter deploying the stack, retrieve the method's endpoint by inspecting the CfnOutput outputs like in the following example:\n\n```sh\nlocalstack@macintosh serverless-data-processing-pipeline % USE_LOCALSTACK=true HOT_DEPLOY=true cdklocal deploy --require-approval=never                                                   \n\n✨  Synthesis time: 3.72s\n\nServerlessDataProcessingPipelineStack:  start: Building dd5711540f04e06aa955d7f4862fc04e8cdea464cb590dae91ed2976bb78098e:current_account-current_region\nServerlessDataProcessingPipelineStack:  success: Built dd5711540f04e06aa955d7f4862fc04e8cdea464cb590dae91ed2976bb78098e:current_account-current_region\nServerlessDataProcessingPipelineStack:  start: Building 4c4836f6c768f4500c058ac6a02f2090830a58eb1a0e58d59a5c7ffadf208861:current_account-current_region\nServerlessDataProcessingPipelineStack:  success: Built 4c4836f6c768f4500c058ac6a02f2090830a58eb1a0e58d59a5c7ffadf208861:current_account-current_region\nServerlessDataProcessingPipelineStack:  start: Publishing dd5711540f04e06aa955d7f4862fc04e8cdea464cb590dae91ed2976bb78098e:current_account-current_region\nServerlessDataProcessingPipelineStack:  start: Publishing 4c4836f6c768f4500c058ac6a02f2090830a58eb1a0e58d59a5c7ffadf208861:current_account-current_region\nServerlessDataProcessingPipelineStack:  success: Published 4c4836f6c768f4500c058ac6a02f2090830a58eb1a0e58d59a5c7ffadf208861:current_account-current_region\nServerlessDataProcessingPipelineStack:  success: Published dd5711540f04e06aa955d7f4862fc04e8cdea464cb590dae91ed2976bb78098e:current_account-current_region\nServerlessDataProcessingPipelineStack: deploying... [1/1]\nServerlessDataProcessingPipelineStack: creating CloudFormation changeset...\n\n ✅  ServerlessDataProcessingPipelineStack\n\n✨  Deployment time: 30.69s\n\nOutputs:\nServerlessDataProcessingPipelineStack.ApiEndpoint4F160690 = https://tsyeuri986.execute-api.localhost.localstack.cloud:4566/prod/\nServerlessDataProcessingPipelineStack.ApiGatewayMethodEndpoint = https://tsyeuri986.execute-api.localhost.localstack.cloud:4566/prod/\nServerlessDataProcessingPipelineStack.DynamoDBTableName = ServerlessDataProcessingPipeline-DynamoDBTable59784FC0-072648f2\nServerlessDataProcessingPipelineStack.Environment = LocalStack\nServerlessDataProcessingPipelineStack.KinesisStreamName = KinesisStream\nStack ARN:\narn:aws:cloudformation:us-east-1:000000000000:stack/ServerlessDataProcessingPipelineStack/68a8d688\n\n✨  Total time: 34.4s\n\nlocalstack@macintosh serverless-data-processing-pipeline % export APIGW_ENDPOINT=\"https://tsyeuri986.execute-api.localhost.localstack.cloud:4566/prod/\"\n```\n\nFollowed by a sample request:\n\n```sh\nlocalstack@macintosh serverless-data-processing-pipeline % timestamp=$(awk 'BEGIN {srand(); print srand()}')\nlocalstack@macintosh serverless-data-processing-pipeline % curl -XPOST -H \"Content-Type: application/json\" $APIGW_ENDPOINT -d \"$(jq -n --arg ts \"$timestamp\" '{id: \"1\", message: \"Hello World\", timestamp: $ts | tonumber}')\" -i\nHTTP/2 200 \ncontent-type: application/json\ncontent-length: 21\ndate: Fri, 31 May 2024 18:07:54 GMT\nserver: hypercorn-h2\n\n{\"message\":\"success\"}\n```\n\n## Stress Test\n\n```sh\n$ k6 run -e APIGW_ENDPOINT=$APIGW_ENDPOINT loadtest.js\n\n\n          /\\      |‾‾| /‾‾/   /‾‾/   \n     /\\  /  \\     |  |/  /   /  /    \n    /  \\/    \\    |     (   /   ‾‾\\  \n   /          \\   |  |\\  \\ |  (‾)  | \n  / __________ \\  |__| \\__\\ \\_____/ .io\n\n     execution: local\n        script: loadtest.js\n        output: -\n\n     scenarios: (100.00%) 1 scenario, 10 max VUs, 1m30s max duration (incl. graceful stop):\n              * default: 10 looping VUs for 1m0s (gracefulStop: 30s)\n\n\n     ✓ status was 200\n     ✓ transaction time OK\n\n     checks.........................: 100.00% ✓ 3432      ✗ 0   \n     data_received..................: 272 kB  4.5 kB/s\n     data_sent......................: 235 kB  3.9 kB/s\n     http_req_blocked...............: avg=484.25µs min=0s       med=1µs      max=87.94ms  p(90)=1µs      p(95)=1µs     \n     http_req_connecting............: avg=4.97µs   min=0s       med=0s       max=940µs    p(90)=0s       p(95)=0s      \n     http_req_duration..............: avg=350.92ms min=203.55ms med=339.86ms max=725.44ms p(90)=406.8ms  p(95)=488.96ms\n       { expected_response:true }...: avg=350.92ms min=203.55ms med=339.86ms max=725.44ms p(90)=406.8ms  p(95)=488.96ms\n     http_req_failed................: 0.00%   ✓ 0         ✗ 1658\n     http_req_receiving.............: avg=41.04ms  min=31.15ms  med=40.83ms  max=55.17ms  p(90)=42.03ms  p(95)=42.94ms \n     http_req_sending...............: avg=64.99µs  min=12µs     med=42µs     max=2.06ms   p(90)=114.5µs  p(95)=150µs   \n     http_req_tls_handshaking.......: avg=213.15µs min=0s       med=0s       max=41.6ms   p(90)=0s       p(95)=0s      \n     http_req_waiting...............: avg=309.81ms min=162.9ms  med=298.6ms  max=689.61ms p(90)=366.07ms p(95)=441.45ms\n     http_reqs......................: 1658    28.289592/s\n     iteration_duration.............: avg=351.62ms min=225.77ms med=340.5ms  max=725.59ms p(90)=406.92ms p(95)=489.08ms\n     iterations.....................: 1658    28.289592/s\n     vus............................: 10      min=10      max=10\n     vus_max........................: 10      min=10      max=10\n\n\nrunning (1m00.7s), 00/10 VUs, 1658 complete and 0 interrupted iterations\ndefault ✓ [======================================] 10 VUs  1m0s\n```\n\nAnd then let's wait until all requests have been processed by the `midstream` and `downstream` Lambda functions. Let's also save the timestamps that indicate how much time it took each request to flow through the entire pipeline.\n\n```sh\n$ ./wait_requests timestamps.json\nMonitoring CloudWatch metrics for new datapoints...\nNo new datapoints added. Exiting.\nExporting CloudWatch metrics to timestamps.json...\n```\n\n## Compute Results\n\nWe need to see how much time it takes (based on percentiles, averages, etc) to run a request through the entire serverless pipeline while the large number of VUs (virtual users) hit it with never ending requests for an entire minute.\n\n```python\nimport pandas as pd\n\n# Load the data from the JSON file into a pandas DataFrame\ndata = pd.read_json('timestamps.json')\n\n# Calculate the desired statistics\nstats = data.describe(percentiles=[.90, .95, .99])\n\n# Print the statistics\nprint(stats)\n```\n\nThe output of that would be:\n\n```text\ncount  93.000000\nmean   40.905096\nstd    16.211245\nmin     6.285714\n50%    47.187500\n90%    56.971429\n95%    57.614691\n99%    58.549990\nmax    58.939394\n```\n\nThe `count` param tells us the whole experiment ran for 93 seconds, but our `k6` test only ran for 60 seconds, so there was some backlogging that occurred. So given that there were `1658` total inbound requests, LocalStack managed to process about 17.8 requests/s. Or more specifically, the pipeline was able to run 17.8 times per second. And that's for 10 virtual users, so about 1.78 requests/s/VU.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flocalstack-samples%2Fserverless-data-processing-pipeline","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flocalstack-samples%2Fserverless-data-processing-pipeline","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flocalstack-samples%2Fserverless-data-processing-pipeline/lists"}