{"id":17694762,"url":"https://github.com/avojak/aws-hadoop-cluster","last_synced_at":"2026-05-05T08:31:31.404Z","repository":{"id":74884584,"uuid":"371759994","full_name":"avojak/aws-hadoop-cluster","owner":"avojak","description":"Infrastructure and configuration-as-code for standing up a Hadoop cluster in AWS","archived":false,"fork":false,"pushed_at":"2021-05-28T17:36:15.000Z","size":13,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-10-09T05:34:32.430Z","etag":null,"topics":["ansible","aws","aws-ec2","configuration-as-code","hadoop","hadoop-cluster","infrastructure-as-code","terraform"],"latest_commit_sha":null,"homepage":"","language":"Jinja","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/avojak.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-05-28T16:30:18.000Z","updated_at":"2021-05-28T17:37:34.000Z","dependencies_parsed_at":null,"dependency_job_id":"abb47086-fd9a-4c38-a571-879c344b4541","html_url":"https://github.com/avojak/aws-hadoop-cluster","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/avojak/aws-hadoop-cluster","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/avojak%2Faws-hadoop-cluster","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/avojak%2Faws-hadoop-cluster/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/avojak%2Faws-hadoop-cluster/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/avojak%2Faws-hadoop-cluster/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/avojak","download_url":"https://codeload.github.com/avojak/aws-hadoop-cluster/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/avojak%2Faws-hadoop-cluster/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32641995,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-04T10:08:07.713Z","status":"online","status_checked_at":"2026-05-05T02:00:06.033Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ansible","aws","aws-ec2","configuration-as-code","hadoop","hadoop-cluster","infrastructure-as-code","terraform"],"created_at":"2024-10-24T13:49:38.069Z","updated_at":"2026-05-05T08:31:31.371Z","avatar_url":"https://github.com/avojak.png","language":"Jinja","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AWS Hadoop Cluster\n\nThis is the infrastructure and configuration-as-code for setting up a Hadoop cluster in AWS.\n\n**Note: This is not 'production-ready', and should not be used as such. There are some shortcuts taken for simplicity that would introduce security concerns in a production environment**\n\n## Prerequisites\n\n- An EC2 keypair created in the AWS console\n    - You will need the keypair name for the Terraform plan, otherwise you won't be able to access the instances later on!\n    - Store the private and public keys in the Ansible vault\n\n## Terraform\n\nThe EC2 instances are currently spec'd as follows:\n- NameNode\n    - m5.xlarge\n    - 64 GB gp3 root block device\n- 3x DataNodes\n    - m5.xlarge\n    - 64 GB gp3 root block device\n\n**Note: The security group needs modification in order for communication to work between the NameNode and DataNodes (currently it only allows SSH)**\n\nUsage:\n\n```bash\n$ make plan\n$ make apply\n```\n\n## Ansible\n\nReplace the stubs in the hosts file with the real IP/hostname for each node resulting from the Terraform step.\n\nNo elastic IPs are configured automatically, but that would be a helpful step to add so that you can restart the instances and not have to worry about reconfiguring Hadoop every time.\n\nUsage:\n\n```bash\n$ make install\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Favojak%2Faws-hadoop-cluster","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Favojak%2Faws-hadoop-cluster","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Favojak%2Faws-hadoop-cluster/lists"}