https://github.com/hardikvasa/hadoop-mapreduce-examples-python

All the Hadoop Mapreduce examples in python!
https://github.com/hardikvasa/hadoop-mapreduce-examples-python

Last synced: 9 months ago
JSON representation

All the Hadoop Mapreduce examples in python!

Host: GitHub
URL: https://github.com/hardikvasa/hadoop-mapreduce-examples-python
Owner: hardikvasa
License: mit
Created: 2015-04-06T04:56:04.000Z (almost 11 years ago)
Default Branch: master
Last Pushed: 2015-05-08T00:13:10.000Z (over 10 years ago)
Last Synced: 2025-03-25T05:34:11.225Z (10 months ago)
Language: Python
Size: 173 KB
Stars: 14
Watchers: 3
Forks: 8
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Hadoop Mapreduce Examples in Python
Couple of the Mapreduce examples in python and a documentation on running them!

## Steps of running the codes

**Folder Structure**

The files are assumed to be stored in the given locations in the Linux OS. This is just an example illustration and in real the location does not matter.

* Hadoop installed in: /usr/local
* words.txt (sample word file on which the mapreduce jobs are run): /usr/local
* mapper.py (mapper file) and reducer.py (reducer file): /usr/local
* words.txt in hdfs: /wordcount

**Creating Files**

`touch words.txt`

**Making Directory in hdfs**

`hadoop fs -mkdir -p /wordcount`

**Copying test file from local directory to hdfs**

`hadoop fs -copyFromLocal /usr/local/words.txt /wordcount`

**Check for file listing on hdfs:**

`hadoop fs -ls /wordcount`

**Running the mapreduce job**

`/usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar -file /usr/local/mapper.py -mapper mapper.py -file /usr/local/reducer.py -reducer reducer.py -input /wordcount/words.txt -output /wordcount/output`

**Print the output**

`hadoop fs -cat /wordcount/output/part-00000`

**Remove the output folder from hdfs**

`hadoop dfs -rmr hdfs:///wordcount/output`

**User friendly list of files and sizes in a directory**

`ls -lh`

**Giving full permissions to a folder if required**

`chmod 777 -R /usr/local/hadoop_store`

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/hardikvasa/hadoop-mapreduce-examples-python

Awesome Lists containing this project

README