{"id":26953256,"url":"https://github.com/dsst95/r-cluster","last_synced_at":"2025-04-03T01:29:24.531Z","repository":{"id":87649963,"uuid":"160686210","full_name":"dsst95/R-cluster","owner":"dsst95","description":"A cluster-computing platform for R scripts","archived":false,"fork":false,"pushed_at":"2019-02-01T14:08:08.000Z","size":2566,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-12-10T19:53:40.396Z","etag":null,"topics":["cluster-computing","parallel-computing","r"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dsst95.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-12-06T14:27:44.000Z","updated_at":"2019-02-01T14:08:09.000Z","dependencies_parsed_at":null,"dependency_job_id":"f12b84f8-abae-4505-9c60-20974540689f","html_url":"https://github.com/dsst95/R-cluster","commit_stats":null,"previous_names":["dsst95/r-cluster"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dsst95%2FR-cluster","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dsst95%2FR-cluster/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dsst95%2FR-cluster/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dsst95%2FR-cluster/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dsst95","download_url":"https://codeload.github.com/dsst95/R-cluster/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246921386,"owners_count":20855250,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cluster-computing","parallel-computing","r"],"created_at":"2025-04-03T01:29:23.758Z","updated_at":"2025-04-03T01:29:24.491Z","avatar_url":"https://github.com/dsst95.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"# R-cluster\n\nA cluster-computing platform for R scripts!\n\n## Overview\n\nThe cluster-computing platform for R scripts is based on the\n[doRedis-package](https://github.com/bwlewis/doRedis). So its only parallizes\nforeach. A overview over the functionality can be get from the following schema:\n\n![schema](./schema.jpg)\n\n* When starting a new job by running the master.R script, as first step the\n  worker.init function if it is available gets exported to redis\n* Further the jobscript and data, over which the foreach loop iterates, gets\n  also exported to redis\n* Then the available worker on the cluster, that gets started with the worker.R\n  script grabs the new job, run once before starting the job the worker.init\n  script and runs the jobscript\n* After a dataset was processed, the results gets written to redis\n* The master takes the results and combines them by using the combine function\n\n## Requirements\n\n* Redis: 2.8.0\n* R: 3.5.2\n* R-Libraries:\n  * doRedis: 1.2.2\n  * parallel: 3.5.1\n  * optparse: 1.6.0\n  * redux: 1.1.0\n\n## Setup\n\nThe installation of the requirements depends on the underlying operation system.\nIts recommended that you have at least two machines, one for the worker and one\nfor the master.\n\nRedis is used to share the data with master and workers and for the job\nmanagement for the workers. For more details see at the documentation of\n[doRedis](https://github.com/bwlewis/doRedis/blob/master/vignettes/doRedis.pdf).\n\n### Redis\n\nDownload and install Redis on the master machine. For more details see on the\nofficial [website](https://redis.io/download).\n\n### R\n\nDownload and install R on the master and worker machines. For more details see\non the official [website](https://www.r-project.org/).\n\n### R libraries\n\nThe required libraries must be installed on the master machine as well as on the\nworker machines. It might be necessary to specify the\n[library path](https://www.r-bloggers.com/package-paths-in-r/) if the default\nlibrary path is not writeable for the current user. To run the platform the\nlibraries can be installed with the following instructions in the interactive R\nshell:\n\n```R\ninstall.packages(\"parallel\")\ninstall.packages(\"optparse\")\ninstall.packages(\"redux\")\n```\n\nThe latest version of the `doRedis` package is only available on github, it is\nnecessary to install it with the following commands:\n\n```R\ninstall.packages(\"devtools\")\nlibrary(devtools)\ninstall_github(\"bwlewis/doRedis\")\n```\n\nThe installation of the reqired libraries for the specific job should be\nimplemented in the [initialization script](#Initialization-script).\n\n## Getting Started\n\nAfter installing and configuring all the requirements the platform can be used.\nEither download the source code or the\n[latest release](https://github.com/dennis95stumm/R-cluster/releases) and place\nit to the folder of your desire.\n\nTo get a job done at least a worker must run on the cluster. This can be done by\nexecuting the following command:\n\n```cmd\nRscript worker.R [options]\n```\n\nThe following options can be passed to the worker script:\n\n| Short | Long              | Description |\n| ----- | ----------------- | ----------- |\n| -m    | --master          | The hostname or ip address of the master where the redis process runs. |\n| -p    | --master-port     | The port of the redis process on the master. |\n| -w    | --master-password | The password of the redis process on the master. |\n| -l    | --logfile         | The path to the workers log file. |\n\nThis script exits after all jobs for a queue where done or if there isn't any\nqueue on the cluster available. So it may be necessary to run this script in a\nservice. For windows the [clients](#clients) can be used.\n\n### Run a job\n\nTo run a new job on the cluster you must execute the following line:\n\n```cmd\nRscript master.R [options]\n```\n\nThe following options can be passed to the master script:\n\n| Short | Long              | Description |\n| ----- | ----------------- | ----------- |\n| -m    | --master          | The hostname or ip address of the master where the redis process runs. |\n| -p    | --master-port     | The port of the redis process on the master. |\n| -w    | --master-password | The password of the redis process on the master. |\n| -c    | --chunksize       | Size of the chunks for the jobs that gets submitted to the worker. |\n| -f    | --file            | Path to the file which contains the data for the job. |\n| -i    | --init            | Path to the init script (e.g. installation of libs) that should be executed on each worker. This file should contain a function named woker.init without any parameters. |\n| -o    | --outfile         | Path to the file where the results of the job should be saved. |\n| -q    | --queue           | The queue the workes should run on. |\n| -s    | --script          | Path to the job script. This script should contain a run function taking only one argument, where the data for the job will be passed. |\n| -t    | --time            | Indicates whether the run time of the script should be measured and printed to the console. **Warning**: The real time of the algorithm may be shorter, you must take into account the waiting time for free workers. |\n\n### Writing new jobs\n\nA job consists of job script and if necessary a initialization script for the\nworkers. The job script gets executed on the master and the initialization\nscript gets executed once at the begin of a job on each worker node.\n\n#### Job script\n\nThe job script must contain a `run` function, that gets called when starting the\njob. It takes one parameter, which contains the path to the input file that\nshould be processed in the job. The processing algorithm can be written in this\nfunction or be splitted up to other functions or scripts. To load necessary\nscripts relative to the script path following construction can be used:\n\n```R\npath \u003c- dirname(parent.frame(2)$ofile)\nsource(paste(path, FILENAME, sep=\"/\"))\n```\n\n#### Initialization script\n\nThe optionally initialization script must contain a `worker.init` function,\nwhere the initilization of each worker node can be done. Be sure that the user\nstarting the worker nodes and the master have write access to the lib path of R.\nIf necessary adjust this path by chainging the appropriate environment\nvariables.\n\n#### Notes\n\n* The job could run slow if there are a lot of iterations, it might be useful to\n  adjust the `chunksize` to get a faster job execution. Also depending on the\n  `combine` function for the results can lead to high cpu consumption on the\n  machine, where was the master script started and this can lead to slowing up\n  the whole job.\n\n## Clients\n\nThere are [clients](https://github.com/ReyKoxha/R-CC) for the scripts for\nwindows available, so that it is not necessary to run the scripts from the\nconsole. \n\n## License\n\nThis project is licensed under the GNU General Public License v3.0 - see the [LICENSE](LICENSE) file for details\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdsst95%2Fr-cluster","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdsst95%2Fr-cluster","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdsst95%2Fr-cluster/lists"}