{"id":16352333,"url":"https://github.com/bugthesystem/cerebro","last_synced_at":"2025-06-10T17:34:22.497Z","repository":{"id":149223061,"uuid":"53047318","full_name":"bugthesystem/cerebro","owner":"bugthesystem","description":"Finding The Median In Large Sets Of Numbers Split Across N Servers using zeromq and nodejs (experimental)","archived":false,"fork":false,"pushed_at":"2016-03-04T11:22:10.000Z","size":20,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-14T13:06:22.341Z","etag":null,"topics":["average","distributed","experimental","large-dataset","median","nodejs","zeromq"],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bugthesystem.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-03-03T12:06:32.000Z","updated_at":"2017-02-16T12:23:04.000Z","dependencies_parsed_at":null,"dependency_job_id":"e5dd0c5e-22a7-412f-a752-0f1e994ae120","html_url":"https://github.com/bugthesystem/cerebro","commit_stats":null,"previous_names":["bugthesystem/cerebro"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bugthesystem%2Fcerebro","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bugthesystem%2Fcerebro/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bugthesystem%2Fcerebro/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bugthesystem%2Fcerebro/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bugthesystem","download_url":"https://codeload.github.com/bugthesystem/cerebro/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239619573,"owners_count":19669447,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["average","distributed","experimental","large-dataset","median","nodejs","zeromq"],"created_at":"2024-10-11T01:25:47.942Z","updated_at":"2025-02-19T07:53:11.458Z","avatar_url":"https://github.com/bugthesystem.png","language":"JavaScript","readme":"###Finding The Median In Large Sets Of Numbers Split Across N Servers using zeromq and nodejs (experimental)\n[![Build Status](https://travis-ci.org/ziyasal/cerebro.svg?branch=master)](https://travis-ci.org/ziyasal/cerebro)  [![Coverage Status](https://coveralls.io/repos/github/ziyasal/cerebro/badge.svg?branch=master)](https://coveralls.io/github/ziyasal/cerebro?branch=master)\n\n- It takes a data and distributes the data equally to workers;\n- When StatsCollector's `getMedian` is called, sends `SORT` message to sort data on workers as first step,\n- After sort operation confirmed for all workers, master sends `GET_MEDIAN` message to get median for each worker and stores median of medians. This value is likely  to be the median of our data set.\n- After this step the `binary search` approach is applied to find exact median.\n  - As a first step of this approach, the median estimation which is median of medians which are gathered from workers, will be used as a mid value in binary search.\n    By collecting the values which are upper and lower than the estimated median, I updated the estimated median in order to equalize the counts of upper and lower values. \n  - This step works recursively and I converge to the exact median.\n  - The recursive step is that the master sends `GET_LOWER_UPPER_COUNTS` message to get lower and upper counts regarding to estimated median.\n \n**Improvements**\n - Could be improve design by decouple from ZeroMQ to provide extensibility (e.g MPI).\n - Dynamically manage worker size and data distribution to workers and continuous data processing (streaming)\n - Could be implement multi-core processing using cluster on worker nodes to improve performance\n \n**Known issues**\n - It needs refactoring to support duplicate data handling\n - It needs design refactoring\n \n ##Usage\n \n ###Install Dependencies\n \n **On Windows**\n ```sh\n npm install\n ```\n \n **On Linux**\n ```sh\n sudo npm install\n ```\n\n ### Commands\n \n **Start App**\n ```js\n//Start Workers up to size that determined in config file (for example:3)\nnode main.js --role='WORKER'\nnode main.js --role='WORKER'\nnode main.js --role='WORKER'\n\n//Start Master\nnode main.js --role='MASTER'\n ```\n \n **Test**\n ```sh\n npm test\n ```\n \n **Coverage**\n ```sh\n npm run test-cov\n ```\n \n **ESLint**\n ```sh\n npm run lint\n ```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbugthesystem%2Fcerebro","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbugthesystem%2Fcerebro","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbugthesystem%2Fcerebro/lists"}