{"id":14982407,"url":"https://github.com/syzer/js-spark","last_synced_at":"2025-04-09T23:16:27.136Z","repository":{"id":18999651,"uuid":"22221844","full_name":"syzer/JS-Spark","owner":"syzer","description":"Realtime calculation distributed system. AKA distributed lodash","archived":false,"fork":false,"pushed_at":"2017-11-20T21:37:52.000Z","size":345,"stargazers_count":190,"open_issues_count":11,"forks_count":14,"subscribers_count":22,"default_branch":"master","last_synced_at":"2025-04-09T23:16:16.492Z","etag":null,"topics":["distributed","distributed-computing","multicore","realtime","spark"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/syzer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-07-24T17:00:50.000Z","updated_at":"2024-03-26T03:20:40.000Z","dependencies_parsed_at":"2022-09-26T16:20:52.405Z","dependency_job_id":null,"html_url":"https://github.com/syzer/JS-Spark","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/syzer%2FJS-Spark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/syzer%2FJS-Spark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/syzer%2FJS-Spark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/syzer%2FJS-Spark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/syzer","download_url":"https://codeload.github.com/syzer/JS-Spark/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248125592,"owners_count":21051771,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["distributed","distributed-computing","multicore","realtime","spark"],"created_at":"2024-09-24T14:05:21.717Z","updated_at":"2025-04-09T23:16:26.834Z","avatar_url":"https://github.com/syzer.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"What is JS-Spark\n====\nDistributed real time computation/job/work queue using JavaScript.\nA JavaScript reimagining of the fabulous Apache Spark and Storm projects.\n\nIf you know `underscore.js` or [`lodash.js`](https://lodash.com/) you may use JS-Spark\nas a distributed version of them.\n\nIf you know Distributed-RPC systems like [storm](https://storm.incubator.apache.org/documentation/Distributed-RPC.html)\nyou will feel at home.\n\nIf you've ever worked with distributed work queues such as Celery, \nyou will find JS-Spark easy to use.\n\n![main page](https://raw.github.com/syzer/JS-Spark/master/public/docs/JS-Spark-main-page.png)\n![computing que](https://raw.github.com/syzer/JS-Spark/master/public/docs/JS-Spark-computing-que-view.png)\n\n\n\nWhy\n===\nThere are no JS tools that can offload your processing to 1000+ CPUs.\nFurthermore, existing tools in other languages, such as Seti@Home,\nGearman, require time, expensive setup of server, and later setting up/supervising clients machines. \n\n**We want to do better.**\nOn JS-Spark your clients need just to click on a **URL**, and the server side has one line installation (less than 5 min).\n\nHadoop is quite slow and requires maintaining a cluster - **we can to do better**.\nImagine that there's no need to set up expensive cluster/cloud solutions. \nUse web browsers! Easily scale to multiple clients. Clients do not need to install anything like Java or other plugins.\n\nSetup in a matter of minutes and you are good to go.\n\nThe possibilities are endless:\n--------------------------\nNo need to setup expensive clusters. \nThe setup takes 5 min and you are good to go.\nYou can do it on one machine. Even on a Raspberry Pi.\n\n* Use as ML tool to process in real time huge streams of data... while all clients still browse their favorite websites\n\n* Use for big data analytics. Connect to Hadoop HDFS and process even terabytes of data.\n\n* Use to safely transfer huge amount of data to remote computers.\n\n* Use as CDN... Today most websites runs slower when more clients use them.\nBut using JS-Spark you can totally reverse this trend. Build websites that run FASTER the more people use them\n\n* Synchronize data between multiple smartphones.. even in Africa\n\n* No expensive cluster setup required!\n\n* Free to use.\n\nHow (Getting started with npm)\n=============================\nTo add a distributed job queue to any node app run:\n\n        npm i --save js-spark\n\nLook for **Usage with npm**.\n\nExample: running multicore jobs in JS:\n====================================\n### Simple example with node multicore jobs\n[example-js-spark-usage](https://github.com/syzer/example-js-spark-usage)\n\n```bash\ngit clone git@github.com:syzer/example-js-spark-usage.git \u0026\u0026 cd $_\nnpm install\n```\n\n### Game of life example\n[distributed-game-of-life](https://github.com/syzer/distributed-game-of-life.git)\n\n```bash\ngit clone https://github.com/syzer/distributed-game-of-life.git \u0026\u0026 cd $_\nnpm install\n```\n\n\n### Example: NLP\nThis example shows how to use one of the Natural Language Processing tools called N-Gram\nin a distributed manner using JS-Spark:\n\n[Distributed-N-Gram](https://github.com/syzer/distributedNgram)\n\n\nIf you'd like to know more about N-grams please read: \n\n[http://en.wikipedia.org/wiki/N-gram](http://en.wikipedia.org/wiki/N-gram) \n\n\nHow (Getting started)\n====================\nPrerequisites: install `Node.js`, then:\ninstall grunt and bower,\n\n```bash\nsudo npm install -g bower\nsudo npm install -g grunt\n```\n\nInstall `js-spark`\n----------------\n```bash\nnpm i --save js-spark\n#or use:\ngit clone git@github.com:syzer/JS-Spark.git \u0026\u0026 cd $_\nnpm install\n```\n        \nThen run:\n     \n        node index \u0026 \n        node client\n        \nOr:\n \n        npm start        \n        \nAfter that you may see how the clients do the heavy lifting.\n\n        \nUsage with npm\n==============\n\n```JavaScript\nvar core = require('jsSpark')({workers:8});\nvar jsSpark = core.jsSpark;\n\njsSpark([20, 30, 40, 50])\n    // this is executed on the client\n    .map(function addOne(num) {\n        return num + 1;\n    })\n    .reduce(function sumUp(sum, num) {\n        return sum + num;\n    })\n    .thru(function addString(num){\n        return \"It was a number but I will convert it to \" + num; \n    })\n    .run()\n    .then(function(data) {\n        // this is executed on back on the server\n        console.log(data);\n    })\n```        \n\nUsage (Examples)\n===============\nClient side heavy CPU computation (MapReduce)\n--------------------------------------------\n\n```JavaScript\ntask = jsSpark([20, 30, 40, 50])\n    // this is executed on client side\n    .map(function addOne(num) {\n        return num + 1;\n    })\n    .reduce(function sumUp(sum, num) {\n        return sum + num;\n    })\n    .run();\n```\n\n\nDistributed version of lodash/underscore \n----------------------------------------\n\n```JavaScript\njsSpark(_.range(10))\n     // https://lodash.com/docs#sortBy\n    .add('sortBy', function _sortBy(el) {\n        return Math.sin(el);\n    })\n    .map(function multiplyBy2(el) {\n        return el * 2;\n    })\n    .filter(function remove5and10(el) {\n        return el % 5 !== 0;\n    })\n    // sum of  [ 2, 4, 6, 8, 12, 14, 16, 18 ] =\u003e 80\n    .reduce(function sumUp(arr, el) {\n        return arr + el;\n    })\n    .run();\n```\n\n\nMultiple retry and clients elections\n------------------------------------\nIf you run calculations via unknown clients is better to recalculate \nsame tasks on different clients:\n\n\n```JavaScript\njsSpark(_.range(10))\n    .reduce(function sumUp(sum, num) {\n        return sum + num;\n    })\n    // how many times to repeat calculations\n    .run({times: 6})\n    .then(function whenClientsFinished(data) {\n        // may also get 2 most relevant answers\n        console.log('Most clients believe that:');\n        console.log('Total sum of numbers from 1 to 10 is:', data);\n    })\n    .catch(function whenClientsArgue(reason) {\n        console.log('Most clients could not agree, ', + reason.toString());\n    });\n```\n\n\nCombined usage with server side processing\n------------------------------------------\n\n```JavaScript\ntask3 = task\n    .then(function serverSideComputingOfData(data) {\n        var basesNumber = data + 21;\n        // All your 101 base are belong to us\n        console.log('All your ' + basesNumber + ' base are belong to us');\n        return basesNumber;\n    })\n    .catch(function (reason) {\n        console.log('Task could not compute ' + reason.toString());\n    });\n```\n\n\nMore references\n===============\nThis project involves reimplementing some nice things from the world of big data, so there are of course some nice\nresources you can use to dive into the topic:\n\n* [Map-Reduce revisited](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.104.5859\u0026rep=rep1\u0026type=pdf)\n* [Awesome BigData - A curated list of awesome frameworks, resources and other things.](https://github.com/onurakpolat/awesome-bigdata)\n\n\nRunning with UI\n===============\n\nNormally you do not need to start UI server. But if you want to build an application on top on the js-spark UI server. Feel free to do so.\n\n        git clone git@github.com:syzer/JS-Spark.git \u0026\u0026 cd $_\n        npm install\n        grunt build\n        grunt serve\n\nTo spam more light-weight (headless) clients:        \n        \n        node client\n\n\n\nRequired to run UI\n==================\n* mongoDB\ndefault connection parameters:\n\n* mongodb://localhost/jssparkui-dev user: 'js-spark', pass: 'js-spark1'\ninstall mongo, make sure mongod(mongo service) is running\nrun mongo shell with command:\n\n```js\nmongo\nuse jssparkui-dev\ndb.createUser({ \n  user: \"js-spark\",\n  pwd: \"js-spark1\",\n  roles: [\n    { role: \"readWrite\", db: \"jssparkui-dev\" }\n  ]\n})\n```\n* old mongodb engines can use `db.addUser()` with same API\n* to run without UI db code is not required!\n\n* on first run you need to seed the db: change option `seedDB: false` =\u003e `seedDB: true`\non `./private/srv/server/config/environment/development.js`\n\nTests\n=====\n`npm test`\n\n\nTODO\n====\n- [X] service/file -\u003e removed for other module\n- [ ] di -\u003e separate module\n- [!] bower for js-spark client\n- [ ] config-\u003e merge different config files\n- [!] server/auth -\u003e split to js-spark-ui module\n- [!] server/api/jobs -\u003e split to js-spark-ui module\n- [ ] split ui\n- [X] more examples\n- [X] example with cli usage (not daemon)\n- [X] example with using thu\n- [?] .add() is might be broken... maybe fix or remove\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsyzer%2Fjs-spark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsyzer%2Fjs-spark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsyzer%2Fjs-spark/lists"}