https://github.com/chrisammon3000/emr-managed-scaling-cluster
EMR cluster with managed autoscaling policy
https://github.com/chrisammon3000/emr-managed-scaling-cluster
aws cloudformation emr makefile
Last synced: about 1 month ago
JSON representation
EMR cluster with managed autoscaling policy
- Host: GitHub
- URL: https://github.com/chrisammon3000/emr-managed-scaling-cluster
- Owner: chrisammon3000
- License: mit
- Created: 2022-12-06T01:11:34.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2022-12-06T04:17:15.000Z (almost 3 years ago)
- Last Synced: 2025-03-16T03:30:37.662Z (7 months ago)
- Topics: aws, cloudformation, emr, makefile
- Language: Makefile
- Homepage:
- Size: 89.8 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# emr-managed-scaling-cluster
Easy CloudFormation deployment for an EMR cluster with managed autoscaling.
## Table of Contents
- [emr-managed-scaling-cluster](#emr-managed-scaling-cluster)
* [Table of Contents](#table-of-contents)
* [Description](#description)
+ [EMR Configuration](#emr-configuration)
- [EMR Version](#emr-version)
- [Applications](#applications)
* [Quickstart](#quickstart)
* [Installation](#installation)
+ [Prerequisites](#prerequisites)
+ [Environment Variables](#environment-variables)
+ [AWS Credentials](#aws-credentials)
* [Usage](#usage)
+ [EMR Version](#emr-version-1)
+ [Network Configuration](#network-configuration)
+ [AWS Deployment](#aws-deployment)
+ [Makefile Usage](#makefile-usage)
* [Troubleshooting](#troubleshooting)
* [References & Links](#references---links)
* [Authors](#authors)Table of contents generated with markdown-toc
## Description
This project uses CloudFormation templates to deploy an EMR cluster with managed autoscaling policy. This can be used to quickly deploy a cluster for development or data analysis.### EMR Configuration
#### EMR Version
The default EMR version is `emr-6.5.0`. This can be changed by setting the `EMR_VERSION` environment variable in `.env`.#### Applications
The following applications are installed by default:
* Hadoop
* Hive
* Tez
* Hue
* Spark
* Livy
* JupyterHub
* JupyterEnterpriseGateway## Quickstart
1. Configure your AWS credentials.
2. Add environment variables to `.env`.
3. Run `make deploy` to deploy the cluster.## Installation
Follow the steps to configure the deployment environment.### Prerequisites
* AWSCLI
* jq### Environment Variables
Sensitive environment variables containing secrets like passwords and API keys must be exported to the environment first.
Create a `.env` file in the project root.
```bash
STAGE=dev
APP_NAME=emr-managed-scaling
AWS_REGION=us-east-1
EMR_VERSION=emr-6.5.0
SUBNET_ID=
```***Important:*** *Always use a `.env` file or AWS SSM Parameter Store or Secrets Manager for sensitive variables like credentials and API keys. Never hard-code them, including when developing. AWS will quarantine an account if any credentials get accidentally exposed and this will cause problems.*
***Make sure that `.env` is listed in `.gitignore`***
### AWS Credentials
Valid AWS credentials must be available to AWS CLI and SAM CLI. The easiest way to do this is running `aws configure`, or by adding them to `~/.aws/credentials` and exporting the `AWS_PROFILE` variable to the environment.For more information visit the documentation page:
[Configuration and credential file settings](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html)## Usage
### EMR Version
The EMR version can be set using the EMR_VERSION environment variable in `.env`. The default is `emr-6.5.0`. For a list of available versions please see the [docs](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-components.html).### Network Configuration
The only required variable for network configuration is the SUBNET_ID variable which must be present in `.env`.### AWS Deployment
Once an AWS profile is configured and environment variables are available, the application can be deployed using `make`.
```bash
make deploy
```### Makefile Usage
```bash
# Deploy all layers
make deploy# Delete all layers
make delete
```## Troubleshooting
* Check your AWS credentials in `~/.aws/credentials`
* Check that the environment variables are available to the services that need them
* Check that the correct environment or interpreter is being used for Python## References & Links
- [AWS EMR workshop](https://catalog.us-east-1.prod.workshops.aws/workshops/c86bd131-f6bf-4e8f-b798-58fd450d3c44/en-US)
- [EMR documentation](https://docs.aws.amazon.com/emr/index.html)
- [CloudFormation EMR documentation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-elasticmapreduce-cluster.html)## Authors
**Primary Contact:** Gregory Lindsey ([@abk7777](https://github.com/abk7777))