https://github.com/arun-george-zachariah/parallel-r
Bayesian structure learning with parallel bnlearn on a distributed R cluster.
https://github.com/arun-george-zachariah/parallel-r
bnlearn distributed-computing parallel r snow
Last synced: 3 days ago
JSON representation
Bayesian structure learning with parallel bnlearn on a distributed R cluster.
- Host: GitHub
- URL: https://github.com/arun-george-zachariah/parallel-r
- Owner: Arun-George-Zachariah
- Created: 2020-11-25T15:25:30.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2020-11-30T20:49:50.000Z (over 4 years ago)
- Last Synced: 2025-02-18T01:19:25.681Z (3 months ago)
- Topics: bnlearn, distributed-computing, parallel, r, snow
- Language: Shell
- Homepage:
- Size: 42 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Parallel-R
Through this project, we set up a distributed R cluster, leveraging the parallel package. The [parallel](https://www.rdocumentation.org/packages/parallel/versions/3.6.2) package offers support for parallel computation by forking parallel process (based on the [multicore](https://cran.r-project.org/src/contrib/Archive/multicore/) package) on the same machine thus utilizing most of the cores of the machine. In addition to it, the package also offers communication using sockets (obtained from the [snow](https://cran.r-project.org/web/packages/snow/index.html) package) parallelizing the computation utlizing the resources of the nodes in the cluster.We then study Bayesian structure learning, by learning the Bayesian structure on a sample dataset, using the [bnlearn](https://www.bnlearn.com/) package. The dataset is split into equal parts, based on the number of nodes in the cluster. A network structure is learnt over each split paralelly and aggregated to output the final structure.
## Dataset
The sample data used is obtained from [learning.test](https://www.bnlearn.com/documentation/man/learning-test.html) a small synthetic dataset compirsing of 6 nodes, 5 arcs and 41 parameters.
![]()
Fig. 1 - learning.test Network (Ref: https://www.bnlearn.com/documentation/networks/)
## Execution
* To setup a distributed R cluster
```
cd scripts && ./configure.sh --machines --user --key
```
Parameter
Default
Description
--machines
../conf/machine_list.txt
A file listing the public IP addresses of the nodes.
--user
${USER}
User name, if different from the current user name.
--key
~/.ssh/id_rsa
Path to the private key.
Eg:
```
cd scripts && ./configure.sh --machines ../conf/machine_list.txt --user arung --key ~/.ssh/id_rsa
```
* To learn the Bayesian network
```
./exec.sh --machines --user --key --inp --data
```
Parameter
Default
Description
--machines
../conf/machine_list.txt
A file listing the public IP addresses of the nodes.
--user
${USER}
User name, if different from the current user name.
--key
~/.ssh/id_rsa
Path to the private key.
--inp
~/.ssh/id_rsa
CSV File
--data
/mydata
Path to the directory to install R packages and save data splits and other metadata.
Eg:
```
./exec.sh --machines conf/machine_list.txt --user arung --key ~/.ssh/id_rsa --inp data/Sample_Data.csv --data /mydata
```## References
* [Learning Bayesian Networks 10 Years Later - Marco Scutari](https://www.bnlearn.com/about/slides/slides-aist17.pdf)
* [Parallel Package](https://www.rdocumentation.org/packages/parallel/versions/3.6.2)
* [Sample Data](https://www.bnlearn.com/documentation/man/learning-test.html)
* [Example Networks](https://www.bnlearn.com/documentation/networks/)