Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/lucidfrontier45/mapredpy

Simple one file MapReduce template in Python
https://github.com/lucidfrontier45/mapredpy

Last synced: about 4 hours ago
JSON representation

Simple one file MapReduce template in Python

Awesome Lists containing this project

README

        

mapred.py
===
This is a one-file python MapReduce template.
You can use this for prototyping your hadoop job.

How to use?
---
Subclass `_BaseMapper` and `_BaseReducer` and implement their `map` and `reduce` methods.
Optionally, you can add some initialization task to `__init__` of your Mapper and Reducer Class.

When run in local for debug to show in stdout

`cat | ./mapred.py -m mapper | sort | ./mapred.py -m reducer`

When run in Hadoop cluster to output in HDFS

`hadoop jar --files mapred.py -mapper './mapred.py -m mapper' -reducer './mapred.py -m reducer' -input -output `

Note
---
Using pypy will give you significant performance gain. Just change the first line of mapred.py.
You also need to install pypy in all the machines of you cluster.