Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lucidfrontier45/mapredpy
Simple one file MapReduce template in Python
https://github.com/lucidfrontier45/mapredpy
Last synced: about 4 hours ago
JSON representation
Simple one file MapReduce template in Python
- Host: GitHub
- URL: https://github.com/lucidfrontier45/mapredpy
- Owner: lucidfrontier45
- Created: 2014-09-08T08:27:08.000Z (about 10 years ago)
- Default Branch: master
- Last Pushed: 2014-09-08T14:52:39.000Z (about 10 years ago)
- Last Synced: 2023-03-12T04:46:43.596Z (over 1 year ago)
- Language: Python
- Size: 125 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
mapred.py
===
This is a one-file python MapReduce template.
You can use this for prototyping your hadoop job.How to use?
---
Subclass `_BaseMapper` and `_BaseReducer` and implement their `map` and `reduce` methods.
Optionally, you can add some initialization task to `__init__` of your Mapper and Reducer Class.When run in local for debug to show in stdout
`cat | ./mapred.py -m mapper | sort | ./mapred.py -m reducer`
When run in Hadoop cluster to output in HDFS
`hadoop jar --files mapred.py -mapper './mapred.py -m mapper' -reducer './mapred.py -m reducer' -input -output `
Note
---
Using pypy will give you significant performance gain. Just change the first line of mapred.py.
You also need to install pypy in all the machines of you cluster.