https://github.com/kibatic/docker-single-node-hadoop

This docker is used to create a single node hadoop with yarn activated
https://github.com/kibatic/docker-single-node-hadoop

Last synced: 7 months ago
JSON representation

This docker is used to create a single node hadoop with yarn activated

Host: GitHub
URL: https://github.com/kibatic/docker-single-node-hadoop
Owner: kibatic
Created: 2015-11-27T10:56:09.000Z (over 9 years ago)
Default Branch: master
Last Pushed: 2021-11-22T07:32:24.000Z (over 3 years ago)
Last Synced: 2024-11-30T00:44:38.473Z (7 months ago)
Language: Shell
Size: 21.5 KB
Stars: 6
Watchers: 4
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        Docker for a single node hadoop installation

============================================

⚠️ [Deprecated] I'm not using this package anymore. If someone want to maintain it or to fork it and maintain the fork, can you contact me (@plv on twitter). I can write a link to your package here.

This repository is used to create an hadoop single instance following the documentation on this page :

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html and the doc

of spark.

Features :

* hadoop / hdfs

* yarn

* spark

* hive

* zeppelin

State of the project :

* Hadoop, yarn, spark, hive, zeppelin : running, not optimized. I'm interested by any feedback.

Quickstart

----------

### clone the project

```bash

git clone https://github.com/kibatic/docker-single-node-hadoop.git

```

### create the container

```

docker-compose build

docker-compose up -d

```

### Zeppelin notebook

You can access to Zeppelin at http://localhost:8002

### Run a basic map reduce example

We put some python map reduce examples in the /example dir inside the container

```bash

docker exec -ti dockersinglenodehadoop_hsn_1 bash

cd /example

hdfs dfs -mkdir /input

hdfs dfs -put fichier.txt /input

hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.7.1.jar -input /input -output /output -mapper /example/mapper.py -reducer /example/reducer.py

hdfs dfs -cat /output/part-00000

```

### run the same basic map reduce with spark

```bash

docker exec -ti dockersinglenodehadoop_hsn_1 bash

cd /example

hdfs dfs -mkdir /input

hdfs dfs -put fichier.txt /input

# run pyspark

pyspark

```

```python

# load file

file = sc.textFile("/input/fichier.txt")

file.collect()

# mapping

def split_words(line):

    return line.split()

def create_pair(word):

    return (word,1)

pairs=file.flatMap(split_words).map(create_pair)

# reducing

def sum_counts(a,b):

    return a+b

wordcount = pairs.reduceByKey(sum_counts)

# display result

wordcount.collect()

```

Features

--------

### Lancer, arrêter le container

```

docker-compose start

docker-compose stop

```

### volume /data in the ./data_docker directory

This directory is a shared volume with the /data of the container.

In this directory we have :

* /data/hdfs for hdfs files

* /data/yarn for yarn files

* /data/transfert just for easy tranfert between the host and the container.

Run container manually from dockerfile

--------------------------------------

Lancer le container et entrer dedans

```bash

docker build -t hsn .

docker run --rm --name hsn hsn

# entrer dans le docker

docker exec -ti hsn bash

```

Arrêter le container

```bash

docker stop hsn

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kibatic/docker-single-node-hadoop

Awesome Lists containing this project

README