Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/captainirs/hadoop-yarn-k8s
A sandbox for running a Hadoop-YARN cluster on Kubernetes
https://github.com/captainirs/hadoop-yarn-k8s
hadoop kubernetes spark yarn
Last synced: 4 days ago
JSON representation
A sandbox for running a Hadoop-YARN cluster on Kubernetes
- Host: GitHub
- URL: https://github.com/captainirs/hadoop-yarn-k8s
- Owner: CaptainIRS
- License: mit
- Created: 2022-10-15T07:10:55.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-02-08T19:10:05.000Z (9 months ago)
- Last Synced: 2024-04-09T14:32:24.747Z (7 months ago)
- Topics: hadoop, kubernetes, spark, yarn
- Language: Makefile
- Homepage:
- Size: 28.3 KB
- Stars: 0
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Hadoop-YARN-k8s Sandbox
This is a sandbox for running a Hadoop YARN cluster on Kubernetes (using Minikube).
The sandbox can be started with a single command and will bring up a Hadoop YARN cluster with 2 datanodes, 1 namenode and 1 resource manager.
The various web interfaces for the cluster are proxied and exposed on the host machine automatically and can be accessed via the URLs listed below.
> [!WARNING]
> The sandbox is intended to be used for testing and development purposes only.---
### Prerequisites
* Minikube
* GNU Make
* Docker [`buildx` CLI plugin](https://github.com/docker/buildx?tab=readme-ov-file#installing)### System Requirements
* Minikube should have at least 8GB of memory and 4 CPUs for the sandbox to run properly (This can be changed in the `Makefile`).---
### Running
* Run `make deploy` to deploy the system.
* Run `make clean` to bring down the system (All data will be lost!)#### Running spark jobs
* Run `make spark_exec` to exec into the spark pod.
* The `work` directory is mounted as `/work` in the spark pod. You can copy your spark job to this directory and run it using `spark-submit`. (Use `--master yarn` to run the job on the YARN cluster.)
* Or you can enter the spark shell using `spark-shell --master yarn` and run your spark jobs interactively.#### Managing the cluster or running MapReduce tasks
* Run `make shell` to exec into the `dfsadmin` pod.
* You can run HDFS commands using `hdfs dfs` or run MapReduce jobs using `yarn jar`.---
### Important URLs
* Datanodes:
* datanode-0: http://datanode-0.datanode.hadoop.svc.localho.st:9864
* datanode-1: http://datanode-1.datanode.hadoop.svc.localho.st:9864
* Node Managers:
* datanode-0: http://datanode-0.datanode.hadoop.svc.localho.st:8042
* datanode-1: http://datanode-1.datanode.hadoop.svc.localho.st:8042
* Namenode: http://namenode-0.namenode.hadoop.svc.localho.st:9870
* Resource Manager: http://resourcemanager-0.resourcemanager.hadoop.svc.localho.st:8089
* Yarn UI 2: http://resourcemanager-0.resourcemanager.hadoop.svc.localho.st:8089/ui2/
* Yarn Timeline Server: http://resourcemanager-0.resourcemanager.hadoop.svc.localho.st:8188
* Mapreduce Job History Server: http://resourcemanager-0.resourcemanager.hadoop.svc.localho.st:19888
* Spark History Server: http://spark.hadoop.svc.localho.st:18080---
### Screenshots
| ![DataNode](https://i.imgur.com/PcLl4f4.png) | ![NameNode](https://i.imgur.com/RSJHLyp.png) |
|:--:|:--:|
| Hadoop Data Node | Hadoop Name Node |
| ![NodeManager](https://i.imgur.com/gR65OSv.png) | ![ResourceManager](https://i.imgur.com/CJ4LXjY.png) |
| YARN Node Manager | YARN Resource Manager |
| ![Spark History Server](https://i.imgur.com/JTQwd0d.png) | ![Spark UI](https://i.imgur.com/PaB6Jze.png) |
| Spark History Server | Spark UI |---
### License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.