Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/syzer/js-spark
Realtime calculation distributed system. AKA distributed lodash
https://github.com/syzer/js-spark
distributed distributed-computing multicore realtime spark
Last synced: about 1 month ago
JSON representation
Realtime calculation distributed system. AKA distributed lodash
- Host: GitHub
- URL: https://github.com/syzer/js-spark
- Owner: syzer
- Created: 2014-07-24T17:00:50.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2017-11-20T21:37:52.000Z (almost 7 years ago)
- Last Synced: 2024-10-12T00:04:45.151Z (about 1 month ago)
- Topics: distributed, distributed-computing, multicore, realtime, spark
- Language: JavaScript
- Homepage:
- Size: 337 KB
- Stars: 190
- Watchers: 23
- Forks: 14
- Open Issues: 11
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
What is JS-Spark
====
Distributed real time computation/job/work queue using JavaScript.
A JavaScript reimagining of the fabulous Apache Spark and Storm projects.If you know `underscore.js` or [`lodash.js`](https://lodash.com/) you may use JS-Spark
as a distributed version of them.If you know Distributed-RPC systems like [storm](https://storm.incubator.apache.org/documentation/Distributed-RPC.html)
you will feel at home.If you've ever worked with distributed work queues such as Celery,
you will find JS-Spark easy to use.![main page](https://raw.github.com/syzer/JS-Spark/master/public/docs/JS-Spark-main-page.png)
![computing que](https://raw.github.com/syzer/JS-Spark/master/public/docs/JS-Spark-computing-que-view.png)Why
===
There are no JS tools that can offload your processing to 1000+ CPUs.
Furthermore, existing tools in other languages, such as Seti@Home,
Gearman, require time, expensive setup of server, and later setting up/supervising clients machines.**We want to do better.**
On JS-Spark your clients need just to click on a **URL**, and the server side has one line installation (less than 5 min).Hadoop is quite slow and requires maintaining a cluster - **we can to do better**.
Imagine that there's no need to set up expensive cluster/cloud solutions.
Use web browsers! Easily scale to multiple clients. Clients do not need to install anything like Java or other plugins.Setup in a matter of minutes and you are good to go.
The possibilities are endless:
--------------------------
No need to setup expensive clusters.
The setup takes 5 min and you are good to go.
You can do it on one machine. Even on a Raspberry Pi.* Use as ML tool to process in real time huge streams of data... while all clients still browse their favorite websites
* Use for big data analytics. Connect to Hadoop HDFS and process even terabytes of data.
* Use to safely transfer huge amount of data to remote computers.
* Use as CDN... Today most websites runs slower when more clients use them.
But using JS-Spark you can totally reverse this trend. Build websites that run FASTER the more people use them* Synchronize data between multiple smartphones.. even in Africa
* No expensive cluster setup required!
* Free to use.
How (Getting started with npm)
=============================
To add a distributed job queue to any node app run:npm i --save js-spark
Look for **Usage with npm**.
Example: running multicore jobs in JS:
====================================
### Simple example with node multicore jobs
[example-js-spark-usage](https://github.com/syzer/example-js-spark-usage)```bash
git clone [email protected]:syzer/example-js-spark-usage.git && cd $_
npm install
```### Game of life example
[distributed-game-of-life](https://github.com/syzer/distributed-game-of-life.git)```bash
git clone https://github.com/syzer/distributed-game-of-life.git && cd $_
npm install
```### Example: NLP
This example shows how to use one of the Natural Language Processing tools called N-Gram
in a distributed manner using JS-Spark:[Distributed-N-Gram](https://github.com/syzer/distributedNgram)
If you'd like to know more about N-grams please read:
[http://en.wikipedia.org/wiki/N-gram](http://en.wikipedia.org/wiki/N-gram)
How (Getting started)
====================
Prerequisites: install `Node.js`, then:
install grunt and bower,```bash
sudo npm install -g bower
sudo npm install -g grunt
```Install `js-spark`
----------------
```bash
npm i --save js-spark
#or use:
git clone [email protected]:syzer/JS-Spark.git && cd $_
npm install
```
Then run:
node index &
node client
Or:
npm start
After that you may see how the clients do the heavy lifting.
Usage with npm
==============```JavaScript
var core = require('jsSpark')({workers:8});
var jsSpark = core.jsSpark;jsSpark([20, 30, 40, 50])
// this is executed on the client
.map(function addOne(num) {
return num + 1;
})
.reduce(function sumUp(sum, num) {
return sum + num;
})
.thru(function addString(num){
return "It was a number but I will convert it to " + num;
})
.run()
.then(function(data) {
// this is executed on back on the server
console.log(data);
})
```Usage (Examples)
===============
Client side heavy CPU computation (MapReduce)
--------------------------------------------```JavaScript
task = jsSpark([20, 30, 40, 50])
// this is executed on client side
.map(function addOne(num) {
return num + 1;
})
.reduce(function sumUp(sum, num) {
return sum + num;
})
.run();
```Distributed version of lodash/underscore
----------------------------------------```JavaScript
jsSpark(_.range(10))
// https://lodash.com/docs#sortBy
.add('sortBy', function _sortBy(el) {
return Math.sin(el);
})
.map(function multiplyBy2(el) {
return el * 2;
})
.filter(function remove5and10(el) {
return el % 5 !== 0;
})
// sum of [ 2, 4, 6, 8, 12, 14, 16, 18 ] => 80
.reduce(function sumUp(arr, el) {
return arr + el;
})
.run();
```Multiple retry and clients elections
------------------------------------
If you run calculations via unknown clients is better to recalculate
same tasks on different clients:```JavaScript
jsSpark(_.range(10))
.reduce(function sumUp(sum, num) {
return sum + num;
})
// how many times to repeat calculations
.run({times: 6})
.then(function whenClientsFinished(data) {
// may also get 2 most relevant answers
console.log('Most clients believe that:');
console.log('Total sum of numbers from 1 to 10 is:', data);
})
.catch(function whenClientsArgue(reason) {
console.log('Most clients could not agree, ', + reason.toString());
});
```Combined usage with server side processing
------------------------------------------```JavaScript
task3 = task
.then(function serverSideComputingOfData(data) {
var basesNumber = data + 21;
// All your 101 base are belong to us
console.log('All your ' + basesNumber + ' base are belong to us');
return basesNumber;
})
.catch(function (reason) {
console.log('Task could not compute ' + reason.toString());
});
```More references
===============
This project involves reimplementing some nice things from the world of big data, so there are of course some nice
resources you can use to dive into the topic:* [Map-Reduce revisited](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.104.5859&rep=rep1&type=pdf)
* [Awesome BigData - A curated list of awesome frameworks, resources and other things.](https://github.com/onurakpolat/awesome-bigdata)Running with UI
===============Normally you do not need to start UI server. But if you want to build an application on top on the js-spark UI server. Feel free to do so.
git clone [email protected]:syzer/JS-Spark.git && cd $_
npm install
grunt build
grunt serveTo spam more light-weight (headless) clients:
node clientRequired to run UI
==================
* mongoDB
default connection parameters:* mongodb://localhost/jssparkui-dev user: 'js-spark', pass: 'js-spark1'
install mongo, make sure mongod(mongo service) is running
run mongo shell with command:```js
mongo
use jssparkui-dev
db.createUser({
user: "js-spark",
pwd: "js-spark1",
roles: [
{ role: "readWrite", db: "jssparkui-dev" }
]
})
```
* old mongodb engines can use `db.addUser()` with same API
* to run without UI db code is not required!* on first run you need to seed the db: change option `seedDB: false` => `seedDB: true`
on `./private/srv/server/config/environment/development.js`Tests
=====
`npm test`TODO
====
- [X] service/file -> removed for other module
- [ ] di -> separate module
- [!] bower for js-spark client
- [ ] config-> merge different config files
- [!] server/auth -> split to js-spark-ui module
- [!] server/api/jobs -> split to js-spark-ui module
- [ ] split ui
- [X] more examples
- [X] example with cli usage (not daemon)
- [X] example with using thu
- [?] .add() is might be broken... maybe fix or remove