https://github.com/airtoxin/glue-mapreduce

node.js mapreduce library that has concept of "once write run anywhere" for hadoop framework.
https://github.com/airtoxin/glue-mapreduce

Last synced: 5 days ago
JSON representation

node.js mapreduce library that has concept of "once write run anywhere" for hadoop framework.

Host: GitHub
URL: https://github.com/airtoxin/glue-mapreduce
Owner: airtoxin
License: mit
Created: 2014-10-23T13:34:38.000Z (over 10 years ago)
Default Branch: master
Last Pushed: 2015-04-26T15:19:20.000Z (about 10 years ago)
Last Synced: 2025-06-28T03:02:51.634Z (11 days ago)
Language: JavaScript
Homepage:
Size: 250 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        glue-mapreduce

==============

[![Build Status](https://travis-ci.org/airtoxin/glue-mapreduce.svg?branch=master)](https://travis-ci.org/airtoxin/glue-mapreduce)

[![Coverage Status](https://img.shields.io/coveralls/airtoxin/glue-mapreduce.svg)](https://coveralls.io/r/airtoxin/glue-mapreduce)

[![Dependency Status](https://gemnasium.com/airtoxin/glue-mapreduce.svg)](https://gemnasium.com/airtoxin/glue-mapreduce)

[![Code Climate](https://codeclimate.com/github/airtoxin/glue-mapreduce/badges/gpa.svg)](https://codeclimate.com/github/airtoxin/glue-mapreduce)

node.js mapreduce library that has concept of "once write run anywhere" for hadoop framework.

##Motivation

Scalable computing is more and more important for analyse and aggregate BigData, and Map-Reduce is scalable computing algorithm for distributed platform used in Hadoop or more another service.

It is important for think about scalable programing to stable running for increasing data day by day. But case of "Map-Reduce is exaggerated for current data size (MB~GB order), but ensure scalability." are very often request.

I solve these problems to run local Map-Reduce aggregation that have portablility to Hadoop platforms.

##Install

`npm i glue-mapreduce`

TODO: publish

##How to use

###Script

__Important__: This script runs on local or "Hadoop", so you should write the core of your Map-Reduce algorithm. Do not write any other algorithms.

```javascript

var mr = new (require('glue-mapreduce'))();

// regist "local" input data

mr.input = function (callback) {

    var error = null;

    // data must be iterable

    var data = fs.readFileSync('somefile.txt').toString().split('\n');

    return callback(error, data);

};

// regist mapper

mr.mapper = function (mapLine, callback) {

    // mapper called per iteration of input data

    var error = null;

    var split = mapLine.split(' ');

    var key = split[0],

        val = split[1];

    return callback(error, [{k: key, v: val}]};

    // callback can be return multi key-value pairs

};

// regist reducer

mr.reducer = function (key, values, callback) {

    // reducer called per iteration of keys

    return callback(error, [{k: key, v: values.length}]);

    // callback can be return multi key-value pairs

};

// run Map-Reduce job

mr.run(data, function (results) {

    // this callback do not called on Hadoop

    /*

    results is array of key value pair

    [{

        k: 'key',

        v: 100 (reduced values)

    }, ...]

    */

});

```

###run options

glue-mapreduce make a decision about whether to run with local Map-Reduce or Hadoop streaming mapper or reducer by command-line argument.

To run __local Map-Reduce__, `node somemapreduce.js local` or no arguments.

To run __Hadoop Streaming Mapper__, `hadop jar hadoop-streaming.jar -mapper 'somemapreduce.js mapper' ...`

To run __Hadoop Streaming Reducer__, `hadoop jar hadoop-streaming.jar -reducer 'somemapreduce.js reducer' ...`

__Important__: To quote command need to assign argument.

These behavior also can control by `mr.mode` variable. This valiable can be taken `'local'`, `'mapper'` or `'reducer'`. e.g. `mr.mode = 'local'` runs local Map-Reduce aggregation.

###Testing

If you want to test your script, following command can be test hadoop mode.

`node myscript.js map < myinput.txt | sort | node myscript.js red`

###Contribute

####Testing

`npm test`

####Coverage

`npm run coverage`

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/airtoxin/glue-mapreduce

Awesome Lists containing this project

README