Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/avojak/aws-hadoop-cluster
Infrastructure and configuration-as-code for standing up a Hadoop cluster in AWS
https://github.com/avojak/aws-hadoop-cluster
ansible aws aws-ec2 configuration-as-code hadoop hadoop-cluster infrastructure-as-code terraform
Last synced: 2 days ago
JSON representation
Infrastructure and configuration-as-code for standing up a Hadoop cluster in AWS
- Host: GitHub
- URL: https://github.com/avojak/aws-hadoop-cluster
- Owner: avojak
- Created: 2021-05-28T16:30:18.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2021-05-28T17:36:15.000Z (over 3 years ago)
- Last Synced: 2025-02-06T03:11:45.620Z (2 days ago)
- Topics: ansible, aws, aws-ec2, configuration-as-code, hadoop, hadoop-cluster, infrastructure-as-code, terraform
- Language: Jinja
- Homepage:
- Size: 12.7 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# AWS Hadoop Cluster
This is the infrastructure and configuration-as-code for setting up a Hadoop cluster in AWS.
**Note: This is not 'production-ready', and should not be used as such. There are some shortcuts taken for simplicity that would introduce security concerns in a production environment**
## Prerequisites
- An EC2 keypair created in the AWS console
- You will need the keypair name for the Terraform plan, otherwise you won't be able to access the instances later on!
- Store the private and public keys in the Ansible vault## Terraform
The EC2 instances are currently spec'd as follows:
- NameNode
- m5.xlarge
- 64 GB gp3 root block device
- 3x DataNodes
- m5.xlarge
- 64 GB gp3 root block device**Note: The security group needs modification in order for communication to work between the NameNode and DataNodes (currently it only allows SSH)**
Usage:
```bash
$ make plan
$ make apply
```## Ansible
Replace the stubs in the hosts file with the real IP/hostname for each node resulting from the Terraform step.
No elastic IPs are configured automatically, but that would be a helpful step to add so that you can restart the instances and not have to worry about reconfiguring Hadoop every time.
Usage:
```bash
$ make install
```