{"id":24630684,"url":"https://github.com/erwan-simon/aws-emr-serverless-local-execution-demo","last_synced_at":"2026-04-12T21:33:13.125Z","repository":{"id":272260666,"uuid":"915671043","full_name":"erwan-simon/aws-emr-serverless-local-execution-demo","owner":"erwan-simon","description":"Demonstration of how you can use Docker to execute Pyspark code locally while having the exact same environment as in EMR Serverless","archived":false,"fork":false,"pushed_at":"2025-01-14T16:01:13.000Z","size":13,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-25T07:12:30.403Z","etag":null,"topics":["aws","docker","emr-serverless","local","local-execution","terraform"],"latest_commit_sha":null,"homepage":"","language":"HCL","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/erwan-simon.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-12T13:54:47.000Z","updated_at":"2025-01-24T10:32:58.000Z","dependencies_parsed_at":"2025-01-13T10:27:25.627Z","dependency_job_id":"f19435e2-04af-4f21-958d-a63e9b2b33a1","html_url":"https://github.com/erwan-simon/aws-emr-serverless-local-execution-demo","commit_stats":null,"previous_names":["erwan-simon/emr-serverless-local-execution-demo"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erwan-simon%2Faws-emr-serverless-local-execution-demo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erwan-simon%2Faws-emr-serverless-local-execution-demo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erwan-simon%2Faws-emr-serverless-local-execution-demo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/erwan-simon%2Faws-emr-serverless-local-execution-demo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/erwan-simon","download_url":"https://codeload.github.com/erwan-simon/aws-emr-serverless-local-execution-demo/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244560388,"owners_count":20472218,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","docker","emr-serverless","local","local-execution","terraform"],"created_at":"2025-01-25T07:12:37.554Z","updated_at":"2026-04-12T21:33:08.104Z","avatar_url":"https://github.com/erwan-simon.png","language":"HCL","funding_links":[],"categories":[],"sub_categories":[],"readme":"# EMR Serverless local execution\n\nThis repository aims to demonstrate the usage of an EMR Serverless Docker image locally.\n\n## Prerequisites\n\n* Docker (tested with version `27.4.0`)\n* Terraform (tested with version `v1.5.7`)\n* AWS CLI (tested with version `2.22.23`)\n* an AWS account with working credentials and relevant permissions\n* A deployed network stack on an AWS account (a VPC with at least one private subnet with access to internet, or with relevant VPC endpoints set up). You will find an example [in this github repository](https://github.com/erwan-simon/aws-network-stack)\n\n## Repository content\n\n* [python/](./python)\n    * [python/code/](./python/code/) : directory with the PySpark code of the processing task\n    * [python/local_test.ipynb](./python/local_test.ipynb) : jupyter notebook to use to test your code locally\n    * [python/requirements.txt](./python/requirements.txt) : file containing your processing task's dependencies\n* [Dockerfile](./Dockerfile) : Dockerfile containing the image definition to use in local and in your AWS EMR Serverless Application\n* [terraform/](./terraform/) : terraform stack allowing to create the ECR, push the docker image in ECR and link it to your EMR Serverless Application\n* [Makefile](./Makefile) : Makefile which contains every useful commands for this demo (do not hesitate to go see for yourself the commands launched by the Makefile)\n\n## Repository usage\n\nFirst ensure that you have your AWS credentials correctly set up :\n```bash\naws sts get-caller-identity\n```\n\nThen you need to build the docker image:\n```bash\nmake build_docker_image\n```\n\nThen you can build the AWS resources and push the Docker image in the created ECR:\n```bash\nmake build_emr_serverless_application VPC_NAME=${NAME_OF_YOUR_EXISTING_VPC}\n```\n\nThen you can test the local execution of your code:\n```bash\nmake run_docker_container\n```\n\nIn another terminal, fetch the token from the logs:\n```bash\ncat logs/stderr | grep token\n```\nYou should see something like `http://127.0.0.1:8888/tree?token=4b59bd747747b234ab93eb9788ada5f91a73e`. Paste this link (with the token) in your browser.\n\nLaunch the `local_test.ipynb` notebook in your Docker container from the jupyter interface and run the first cell, which will execute your task code.\n\nAfter executing the code, you can check in AWS Athena in your AWS console that the table `test_database.my_table` exists and contains 3 rows:\n\n| name | age |\n| --- | --- |\n| Alice | 34 |\n| Bob | 45 |\n| Cathy | 29 |\n\nIf needed, you can directly modify your code in the Docker container **OR** directly using your IDE in your [python/code/](python/code/) directory, changes will be automatically synchronized both ways, allowing faster local development iterations.\n\nFinally when you are satisfied with your code you can rebuild your final Docker image (using `make build_docker_image`), repush it (using `make build_emr_serverless_application VPC_NAME=${NAME_OF_YOUR_EXISTING_VPC}`) and run your EMR Serverless job in AWS:\n```bash\nmake run_emr_serverless_job\n```\nThis command will also print the url of your job run in your EMR studio, which you can paste in your browser to go directly in your AWS console to watch the execution of your job.\n\n## Clean up\n\nYou can delete created AWS resources with the following commands :\n```bash\nmake destroy_emr_serverless_application VPC_NAME=${NAME_OF_YOUR_EXISTING_VPC}\n```\n\n**Do not forget to stop your EMR Serverless application and to empty your S3 bucket BEFORE executing this command, else it will NOT work.**\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ferwan-simon%2Faws-emr-serverless-local-execution-demo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ferwan-simon%2Faws-emr-serverless-local-execution-demo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ferwan-simon%2Faws-emr-serverless-local-execution-demo/lists"}