https://github.com/ljdursi/beyond-single-core-r
Short tour of parallel and foreach packages, and how to think about scaling data analyses
https://github.com/ljdursi/beyond-single-core-r
parallel parallel-computing r scalability
Last synced: about 1 year ago
JSON representation
Short tour of parallel and foreach packages, and how to think about scaling data analyses
- Host: GitHub
- URL: https://github.com/ljdursi/beyond-single-core-r
- Owner: ljdursi
- License: other
- Created: 2017-02-08T20:43:28.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2020-08-23T01:34:12.000Z (almost 6 years ago)
- Last Synced: 2024-07-21T07:32:41.628Z (almost 2 years ago)
- Topics: parallel, parallel-computing, r, scalability
- Language: R
- Homepage: https://ljdursi.github.io/beyond-single-core-R
- Size: 80.6 MB
- Stars: 75
- Watchers: 5
- Forks: 14
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Beyond Single Core: Parallel Analysis in R
===================
R is a great environment for interactive analysis on your desktop, but when your data needs outgrow your
personal computer, it's not clear what to do next.
This is material for a short overview of scalable data analysis in R. The slides can be viewed at https://ljdursi.github.io/beyond-single-core-R .
It covers:
* How to think about parallelism and scalability in data analysis
* The standard parallel package, including what was the snow and multicore facilities,
using [airline data](http://stat-computing.org/dataexpo/2009/the-data.html) as an example
* The [foreach](http://cran.r-project.org/web/packages/foreach/index.html) package, using
airline data and simple stock data;
* A summary of best practices.
Included in the materials, though not in the talk, are some more advanced methods:
* The [bigmemory](http://cran.r-project.org/web/packages/bigmemory/index.html) package for out-of-core computation on large data matrices, with a simple physical sciences example;
* The [Rdsm](http://cran.r-project.org/web/packages/Rdsm/index.html) package for shared memory; and
* a brief introduction to the powerful [pbdR](http://r-pbd.org) pacakges for extremely large-scale computation.