https://github.com/lucidfrontier45/mapredpy

Simple one file MapReduce template in Python
https://github.com/lucidfrontier45/mapredpy

Last synced: 5 months ago
JSON representation

Simple one file MapReduce template in Python

Host: GitHub
URL: https://github.com/lucidfrontier45/mapredpy
Owner: lucidfrontier45
Created: 2014-09-08T08:27:08.000Z (almost 11 years ago)
Default Branch: master
Last Pushed: 2014-09-08T14:52:39.000Z (almost 11 years ago)
Last Synced: 2024-12-31T23:12:06.209Z (6 months ago)
Language: Python
Size: 125 KB
Stars: 0
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

mapred.py
===
This is a one-file python MapReduce template.
You can use this for prototyping your hadoop job.

How to use?
---
Subclass `_BaseMapper` and `_BaseReducer` and implement their `map` and `reduce` methods.
Optionally, you can add some initialization task to `__init__` of your Mapper and Reducer Class.

When run in local for debug to show in stdout

`cat | ./mapred.py -m mapper | sort | ./mapred.py -m reducer`

When run in Hadoop cluster to output in HDFS

`hadoop jar --files mapred.py -mapper './mapred.py -m mapper' -reducer './mapred.py -m reducer' -input -output `

Note
---
Using pypy will give you significant performance gain. Just change the first line of mapred.py.
You also need to install pypy in all the machines of you cluster.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lucidfrontier45/mapredpy

Awesome Lists containing this project

README