{"id":30874215,"url":"https://github.com/localstack-samples/sample-chaos-serverless-multi-region-failover","last_synced_at":"2025-09-08T00:33:15.213Z","repository":{"id":207222236,"uuid":"709536919","full_name":"localstack-samples/sample-chaos-serverless-multi-region-failover","owner":"localstack-samples","description":"Demonstrates chaos engineering in a multi-region serverless application using API Gateway, Lambda, DynamoDB, and Route53. Test resilience, automated failover, and data integrity with LocalStack's Chaos API.","archived":false,"fork":false,"pushed_at":"2025-08-11T11:37:48.000Z","size":21530,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":13,"default_branch":"main","last_synced_at":"2025-08-11T13:04:38.793Z","etag":null,"topics":["chaos-engineering","chaos-testing","dynamodb","failover","lambda","localstack-sample-app","multi-region","resiliency-engineering","route53","serverless"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/localstack-samples.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-10-24T22:06:21.000Z","updated_at":"2025-08-11T11:37:51.000Z","dependencies_parsed_at":null,"dependency_job_id":"2ca8a8b7-d5f5-4fc5-a957-6761496800c9","html_url":"https://github.com/localstack-samples/sample-chaos-serverless-multi-region-failover","commit_stats":null,"previous_names":["localstack-samples/samples-chaos-engineering"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/localstack-samples/sample-chaos-serverless-multi-region-failover","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/localstack-samples%2Fsample-chaos-serverless-multi-region-failover","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/localstack-samples%2Fsample-chaos-serverless-multi-region-failover/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/localstack-samples%2Fsample-chaos-serverless-multi-region-failover/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/localstack-samples%2Fsample-chaos-serverless-multi-region-failover/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/localstack-samples","download_url":"https://codeload.github.com/localstack-samples/sample-chaos-serverless-multi-region-failover/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/localstack-samples%2Fsample-chaos-serverless-multi-region-failover/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274117170,"owners_count":25225100,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-07T02:00:09.463Z","response_time":67,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chaos-engineering","chaos-testing","dynamodb","failover","lambda","localstack-sample-app","multi-region","resiliency-engineering","route53","serverless"],"created_at":"2025-09-08T00:32:42.716Z","updated_at":"2025-09-08T00:33:15.147Z","avatar_url":"https://github.com/localstack-samples.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Chaos Testing a Serverless App with LocalStack\n\n| Key          | Value                                                                                      |\n| ------------ | ------------------------------------------------------------------------------------------ |\n| Environment  | LocalStack                                                                                 |\n| Services     | API Gateway, Lambda, DynamoDB, SNS, SQS, Route53                                           |\n| Integrations | AWS CLI, Maven, pytest                                                                     |\n| Categories   | Chaos Engineering, Serverless, Multi-Region                                                |\n| Level        | Advanced                                                                                   |\n| Use Case     | Chaos Engineering, Serverless, Multi-Region                                                |\n| GitHub       | [Repository link](https://github.com/localstack-samples/sample-chaos-serverless-multi-region-failover)                 |\n\n## Introduction\n\nThis sample demonstrates how to test resiliency in serverless applications using chaos engineering principles, provided by LocalStack's [Chaos API](https://docs.localstack.cloud/aws/capabilities/chaos-engineering/chaos-api/). The application features a multi-region product management system that gracefully handles service outages through automated failover mechanisms and message queuing. To test this application sample, we will demonstrate how you use the Chaos API to inject controlled failures into your infrastructure and validate that your application responds appropriately. We will show how Route53 health checks automatically redirect traffic between regions during outages and how SNS/SQS messaging ensures no data is lost when services are unavailable.\n\n\u003e [!NOTE]\n\u003e This sample demonstrates LocalStack's new Chaos API, which replaces the previous FIS (Fault Injection Simulator) functionality in this sample application. Chaos API provides more comprehensive local fault injection testing for cloud-native applications and is available in [LocalStack Enterprise](https://localstack.cloud/enterprise/).\n\n## Architecture\n\nThe following diagram shows the architecture that this sample application builds and deploys:\n\n![Architecture Diagram](images/architecture.png)\n\n\u003e [!NOTE]\n\u003e The above architecture diagram is a simplified view of the application. The actual architecture is more complex and includes additional services and components, distributed across multiple regions.\n\n**Primary Region (us-east-1):**\n\n- [API Gateway](https://docs.localstack.cloud/aws/services/apigateway/) with product management and health check endpoints\n- [Lambda Functions](https://docs.localstack.cloud/aws/services/lambda/) for product CRUD operations and health monitoring\n- [DynamoDB](https://docs.localstack.cloud/aws/services/dynamodb/) table for product storage with streams enabled\n- [SNS Topic](https://docs.localstack.cloud/aws/services/sns/) for publishing failed requests during outages\n- [SQS Queue](https://docs.localstack.cloud/aws/services/sqs/) for buffering requests when DynamoDB is unavailable\n\n**Secondary Region (us-west-1):**\n\n- Identical service stack for failover scenarios\n- DynamoDB table synchronized via streams and Lambda replication\n- Independent health check endpoint for Route53 monitoring\n\n**Cross-Region Components:**\n\n- [Route53](https://docs.localstack.cloud/aws/services/route53/) hosted zone with health checks and failover routing policies\n- DNS-based traffic routing with automatic failover capabilities\n\n## Prerequisites\n\n- [`LOCALSTACK_AUTH_TOKEN`](https://docs.localstack.cloud/getting-started/auth-token/)\n- [Docker](https://docs.docker.com/get-docker/) and [Docker Compose](https://docs.docker.com/compose/install/)\n- [AWS CLI](https://docs.localstack.cloud/user-guide/integrations/aws-cli/) with the [`awslocal` wrapper](https://docs.localstack.cloud/user-guide/integrations/aws-cli/#localstack-aws-cli-awslocal)\n- [Maven 3.8.5+](https://maven.apache.org/install.html) \u0026 [Java 17](https://www.java.com/en/download/help/download_options.html)\n- [Python 3.11+](https://www.python.org/downloads/)\n- [`make`](https://www.gnu.org/software/make/) (**optional**, but recommended for running the sample application)\n- [`dig`](https://linux.die.net/man/1/dig) command-line DNS lookup utility\n\n## Installation\n\nTo run the sample application, you need to install the required dependencies.\n\nFirst, clone the repository:\n\n```shell\ngit clone https://github.com/localstack/sample-chaos-serverless-multi-region-failover.git\n```\n\nThen, navigate to the project directory:\n\n```shell\ncd sample-chaos-serverless-multi-region-failover\n```\n\nInstall the project dependencies by running the following command:\n\n```shell\nmake install\n```\n\nThis will:\n\n- Build the Java Lambda functions and package them into JAR files\n- Install Python test dependencies for the integration test suite\n\n## Deployment\n\nStart LocalStack using Docker Compose with the `LOCALSTACK_AUTH_TOKEN` pre-configured:\n\n```shell\nLOCALSTACK_AUTH_TOKEN=\u003cyour-auth-token\u003e docker compose up\n```\n\nThe infrastructure will be automatically deployed using LocalStack's [Initialization Hooks](https://docs.localstack.cloud/aws/capabilities/config/initialization-hooks/). The deployment creates:\n\n- DynamoDB tables in both `us-east-1` and `us-west-1` regions\n- Lambda functions for product management and health checks  \n- API Gateway endpoints with custom domain configurations\n- SNS topics and SQS queues for message buffering\n- DynamoDB streams with replication Lambda triggers\n\nTo deploy additional chaos engineering scenarios, run:\n\n```shell\nmake deploy\n```\n\nThis executes the solution scripts:\n\n```shell\n./solutions/dynamodb-outage.sh    # Sets up DynamoDB outage handling\n./solutions/route53-failover.sh   # Configures Route53 DNS failover\n```\n\n## Testing\n\nThe sample application provides comprehensive test coverage for both chaos engineering scenarios.\n\n### Running All Tests\n\nExecute the complete test suite:\n\n```shell\nmake test\n```\n\nThis runs:\n- DynamoDB outage resilience tests\n- Route53 DNS failover validation\n- End-to-end integration scenarios\n\n### Manual Testing\n\nTest normal product operations:\n\n```shell\ncurl --location 'http://12345.execute-api.localhost.localstack.cloud:4566/dev/productApi' \\\n  --header 'Content-Type: application/json' \\\n  --data '{\n    \"id\": \"prod-2024\",\n    \"name\": \"Test Product\",\n    \"price\": \"29.99\",\n    \"description\": \"A product for testing chaos scenarios\"\n  }'\n```\n\nExpected response: `Product added/updated successfully.`\n\n### DNS Resolution Testing\n\nVerify Route53 failover configuration:\n\n```shell\ndig @localhost test.hello-localstack.com CNAME\n```\n\nThis should resolve to the primary API Gateway endpoint initially, then switch to the secondary during outages.\n\n## Use Cases\n\n### Chaos Engineering\n\nThis sample demonstrates comprehensive chaos engineering practices by using LocalStack's Chaos API to inject controlled failures into your infrastructure. The chaos testing validates that your application can gracefully handle service outages without data loss.\n\nThe application includes sophisticated error handling for database outages. When DynamoDB becomes unavailable, the Lambda functions:\n\n1. Catch `DynamoDbException` errors from AWS SDK calls\n2. Return user-friendly error messages instead of failing completely  \n3. Publish failed requests to SNS for later processing\n4. Use SQS dead letter queues and retry mechanisms\n5. Automatically process queued items when services recover\n\nTo simulate a DynamoDB outage:\n\n```shell\ncurl -X POST 'http://localhost:4566/_localstack/chaos/faults' \\\n  -H 'Content-Type: application/json' \\\n  -d '[{\"service\": \"dynamodb\", \"region\": \"us-east-1\"}]'\n```\n\nDuring the outage, product creation requests are gracefully handled:\n\n```shell\ncurl --location 'http://12345.execute-api.localhost.localstack.cloud:4566/dev/productApi' \\\n  --data '{\"id\": \"prod-outage\", \"name\": \"Outage Test\", \"price\": \"19.99\", \"description\": \"Testing resilience\"}'\n```\n\nExpected response: `A DynamoDB error occurred. Message sent to queue.`\n\nThe message is automatically processed when you clear the outage:\n\n```shell\ncurl -X DELETE 'http://localhost:4566/_localstack/chaos/faults' \\\n  -H 'Content-Type: application/json' \\\n  -d '[]'\n```\n\nQuery the DynamoDB table to see the product:\n\n```shell\nawslocal dynamodb scan --table-name Products\n```\n\nThe key chaos engineering patterns used in this sample are:\n\n- Using LocalStack Chaos API for controlled service failures\n- Monitoring application behavior during failure scenarios\n- Ensuring systems return to normal operation after faults clear\n- Limiting failures to specific services and regions\n- Validating resilience through repeatable test scenarios\n\n### Route53 DNS Failover\n\nThe sample showcases advanced DNS failover capabilities using Route53 health checks and routing policies. This ensures high availability by automatically redirecting traffic from failed regions to healthy alternatives.\n\nThe Route53 setup includes:\n\n1. Monitoring primary region endpoints every 10 seconds\n2. Primary and secondary CNAME records with different priorities\n3. Services deployed across `us-east-1` (primary) and `us-west-1` (secondary)\n4. DNS resolution changes based on health check status\n5. Traffic automatically returns to primary when healthy\n\nVerify initial DNS resolution points to primary:\n\n```shell\ndig @localhost test.hello-localstack.com CNAME\n# Expected: 12345.execute-api.localhost.localstack.cloud\n```\n\nInject chaos into the primary region:\n\n```shell\ncurl -X POST 'http://localhost:4566/_localstack/chaos/faults' \\\n  -H 'Content-Type: application/json' \\\n  -d '[\n    {\"service\": \"apigateway\", \"region\": \"us-east-1\"},\n    {\"service\": \"lambda\", \"region\": \"us-east-1\"}\n  ]'\n```\n\nWait for health check failures and verify the failover:\n\n```shell\ndig @localhost test.hello-localstack.com CNAME  \n# Expected: 67890.execute-api.localhost.localstack.cloud\n```\n\nClear the chaos to test failback:\n\n```shell\ncurl -X DELETE 'http://localhost:4566/_localstack/chaos/faults' \\\n  -H 'Content-Type: application/json' \\\n  -d '[]'\n```\n\n## Troubleshooting\n\n| Issue | Resolution |\n|-------|------------|\n| DNS resolution returns NXDOMAIN | Ensure LocalStack is running with DNS enabled (port 53). Verify hosted zone exists with `awslocal route53 list-hosted-zones` |\n| Health checks always report unhealthy | Check that API Gateway endpoints respond with HTTP 200. Verify Lambda functions are deployed and working: `awslocal lambda list-functions` |\n| Failover not triggering after chaos injection | Wait at least 25 seconds for health check failure threshold. Check chaos faults are active: `curl --location --request GET 'http://localhost.localstack.cloud:4566/_localstack/chaos/faults'` |\n| Products not appearing in DynamoDB after recovery | Verify SQS queue processing with `awslocal sqs receive-message`. Check Lambda function logs for processing errors |\n\n## Learn More\n\n- [LocalStack Chaos Engineering](https://docs.localstack.cloud/chaos-engineering/) (**recommended**)\n- [Route53 Health Checks and Failover](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/dns-failover.html)  \n- [Chaos Engineering Principles](https://principlesofchaos.org/)\n- [Testing resilience in cloud applications with LocalStack](https://blog.localstack.cloud/tags/Chaos%20Engineering/)\n- [AWS Lambda Best Practices](https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html)\n- [DynamoDB Streams and Triggers](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html)\n- [SNS/SQS Messaging Patterns](https://docs.aws.amazon.com/sns/latest/dg/sns-sqs-as-subscriber.html)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flocalstack-samples%2Fsample-chaos-serverless-multi-region-failover","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flocalstack-samples%2Fsample-chaos-serverless-multi-region-failover","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flocalstack-samples%2Fsample-chaos-serverless-multi-region-failover/lists"}