https://github.com/hardikvasa/hadoop-mapreduce-examples-python
All the Hadoop Mapreduce examples in python!
https://github.com/hardikvasa/hadoop-mapreduce-examples-python
Last synced: 9 months ago
JSON representation
All the Hadoop Mapreduce examples in python!
- Host: GitHub
- URL: https://github.com/hardikvasa/hadoop-mapreduce-examples-python
- Owner: hardikvasa
- License: mit
- Created: 2015-04-06T04:56:04.000Z (almost 11 years ago)
- Default Branch: master
- Last Pushed: 2015-05-08T00:13:10.000Z (over 10 years ago)
- Last Synced: 2025-03-25T05:34:11.225Z (10 months ago)
- Language: Python
- Size: 173 KB
- Stars: 14
- Watchers: 3
- Forks: 8
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Hadoop Mapreduce Examples in Python
Couple of the Mapreduce examples in python and a documentation on running them!
## Steps of running the codes
**Folder Structure**
The files are assumed to be stored in the given locations in the Linux OS. This is just an example illustration and in real the location does not matter.
* Hadoop installed in: /usr/local
* words.txt (sample word file on which the mapreduce jobs are run): /usr/local
* mapper.py (mapper file) and reducer.py (reducer file): /usr/local
* words.txt in hdfs: /wordcount
**Creating Files**
`touch words.txt`
**Making Directory in hdfs**
`hadoop fs -mkdir -p /wordcount`
**Copying test file from local directory to hdfs**
`hadoop fs -copyFromLocal /usr/local/words.txt /wordcount`
**Check for file listing on hdfs:**
`hadoop fs -ls /wordcount`
**Running the mapreduce job**
`/usr/local/hadoop/bin/hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar -file /usr/local/mapper.py -mapper mapper.py -file /usr/local/reducer.py -reducer reducer.py -input /wordcount/words.txt -output /wordcount/output`
**Print the output**
`hadoop fs -cat /wordcount/output/part-00000`
**Remove the output folder from hdfs**
`hadoop dfs -rmr hdfs:///wordcount/output`
**User friendly list of files and sizes in a directory**
`ls -lh`
**Giving full permissions to a folder if required**
`chmod 777 -R /usr/local/hadoop_store`