Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sergiosim/debian-hadoop
A vagrant project with a debian box running hadoop
https://github.com/sergiosim/debian-hadoop
Last synced: 6 days ago
JSON representation
A vagrant project with a debian box running hadoop
- Host: GitHub
- URL: https://github.com/sergiosim/debian-hadoop
- Owner: SergioSim
- Created: 2022-10-12T06:22:54.000Z (about 2 years ago)
- Default Branch: master
- Last Pushed: 2024-04-16T14:32:50.000Z (7 months ago)
- Last Synced: 2024-10-17T03:49:49.161Z (21 days ago)
- Language: Shell
- Size: 537 KB
- Stars: 1
- Watchers: 2
- Forks: 8
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Debian-Hadoop
An Debian vagrant box with Hadoop
## Prerequisites
- Install a git client (for example [git SCM](https://git-scm.com/download/win)) and clone this repository: `git clone https://github.com/SergioSim/debian-hadoop.git`
- Install [virtualbox 7.0.10](https://www.virtualbox.org/wiki/Downloads)
with the [extension pack 7.0.10](https://download.virtualbox.org/virtualbox/7.0.10/Oracle_VM_VirtualBox_Extension_Pack-7.0.10.vbox-extpack)
- Install [vagrant](https://www.vagrantup.com/)## Getting started
**Install & start the vagrant box using VirtualBox:**
Navigate to the debian-hadoop directory and run the following command:
```bash
$ vagrant up
```**Install & start the vagrant box using Docker:**
```bash
$ vagrant up --provider docker
```**Connect to the vagrant box (VM):**
```bash
$ vagrant ssh
```## Useful vagrant commands
Check the vagrant box (VM) status:
```bash
$ vagrant status
```
> You can also check the status of all vagrant boxes you have installed with `vagrant global-status`.Power-off the vagrant box:
```bash
$ vagrant halt
```Remove the vagrant box from disk:
```bash
$ vagrant destroy
```Start jupyter notebook:
```bash
$ jupyter notebook --ip=0.0.0.0
```## Health check
Once you have installed, started and connected to the vagrant box.
Check if Hadoop processes (HDFS & YARN) are running:
```bash
vagrant@bullseye:~$ jps
```The command should output something like:
```
4416 DataNode
4978 NodeManager
4580 SecondaryNameNode
5351 Jps
4873 ResourceManager
4287 NameNode
```> If no Hadoop processes show up, try to start Hadoop:
```bash
vagrant@bullseye:~$ start-dfs.sh
vagrant@bullseye:~$ start-yarn.sh
```## Troubleshooting
**Symptom: the `vagrant up` command succeeded but Hadoop processes didn't start.**
> If you are running for the first time:
Try to reinstall the vagrant box:
```bash
vagrant up --provision
```> Else, if it was working previously - try to start Hadoop.
**Symptom: the `vagrant ssh` command fails with `Permission denied (publickey)`.**
1. Get the location of your vagrant ssh private key: `vagrant ssh-config`
Then copy the path of the IdentityFile
2. Run the following command instead of `vagrant ssh` to connect to the VM:
```bash
ssh -i path/to/identity/file/private_key -p 2222 [email protected]
```## Updating configuration files
To apply changes done in `config/vagrant` - run the `prerequisites` provisioner:
```bash
vagrant provision --provision-with prerequisites
```To apply changes done in `config/hadoop` - run the `install_hadoop` provisioner:
```bash
vagrant provision --provision-with install_hadoop
```> Warning: this will format HDFS (removing all files on HDFS)