An open API service indexing awesome lists of open source software.

https://github.com/kevinndungu-source/amazon_emr_project_resources

Explore and replicate Amazon EMR (Elastic MapReduce) setup and utilization for big data processing and analytics tasks, featuring comprehensive demonstrations from VPC creation to Spark job execution.
https://github.com/kevinndungu-source/amazon_emr_project_resources

aws-ec2 bigdata bigdatainfrastructure datamanagement dataprocessing emr-cluster juypter-notebook pyspark python

Last synced: 3 months ago
JSON representation

Explore and replicate Amazon EMR (Elastic MapReduce) setup and utilization for big data processing and analytics tasks, featuring comprehensive demonstrations from VPC creation to Spark job execution.

Awesome Lists containing this project

README

          

# Amazon Elastic Map Reduce (EMR) Demonstration
Reposits the resources used in the EMR on EC2 Cluster project.

![Amazon-EMR](https://github.com/kevinndungu-source/EMR_Demonstration_Resources/assets/114335263/3633eded-d2b0-4a21-884a-5ef71a42cb96)

---

## Project Descriptions

### 1. Amazon EMR Demonstration
- **Overview:** This project demonstrates how to set up and utilize Amazon EMR (Elastic MapReduce) for big data processing and analytics tasks.

Included in the documentation File:
- **VPC creation**: The demonstration on creating an Amazon VPC.
- **Amazon Simple Storage Service_bucket_creation**: Demonstration on creating an Amazon S3 bucket.
- **IAM role creation**: Demonstration on creating an IAM role in AWS Management Console.
- **EMR cluster creation**: The demonstration on creating an Amazon EMR on EC2 cluster.
- **EMR studio creation**: The demonstration on creating an Amazon EMR Studio.
- **EMR workspace creation**: The demonstration on creating an Amazon EMR workspace.
- **Spark job execution**: The demonstration on running a Spark Job with Amazon EMR Studio Notebook.
- **Resource cleanup**: The demonstration on cleaning up the resources.

[Documentation.pdf](https://drive.google.com/file/d/1zxrx1NdSQPI7zsVkzujVUDtXZfiq8G71/view?usp=drive_link): Detailed documentation of the entire Amazon EMR demonstration.

### 2. Dataset and Code Files
- Description: This repository contains the dataset and code files used in the Amazon EMR demonstration project as listed below:
- **dataset_en_dev.json**: Dataset file used in the demonstration.
- **reviews.py**: Python script used in the demonstration.
- **reviews.ipynb**: Jupyter notebook used in the demonstration.

---

## Usage
1. Clone this repository to your local machine.
2. Explore the project folders and files to understand each demonstration.
3. Follow the instructions provided in the transcripts and documentation to replicate the demonstrations in your own AWS environment.

---