Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/cluebbers/using_r_for_hpda
Exploring R for high-performance data analytics, including memory management, GPU computing, parallel processing, benchmarks, case studies, and comparisons with Python.
https://github.com/cluebbers/using_r_for_hpda
benchmarking case-studies data-science gpu-computing high-performance-data-analytics memory-management parallel-processing python-comparison r
Last synced: about 1 month ago
JSON representation
Exploring R for high-performance data analytics, including memory management, GPU computing, parallel processing, benchmarks, case studies, and comparisons with Python.
- Host: GitHub
- URL: https://github.com/cluebbers/using_r_for_hpda
- Owner: cluebbers
- License: mit
- Created: 2024-05-15T10:17:43.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-05-17T12:48:40.000Z (7 months ago)
- Last Synced: 2024-05-17T13:52:49.149Z (7 months ago)
- Topics: benchmarking, case-studies, data-science, gpu-computing, high-performance-data-analytics, memory-management, parallel-processing, python-comparison, r
- Homepage:
- Size: 938 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Using R for High-Performance Data Analytics
## Overview
This repository contains the seminar report and associated materials for the course "Newest Trends in High-Performance Data Analytics" at Georg-August-Universität Göttingen.
The report investigates the use of R in high-performance data analytics (HPDA), focusing on memory management, GPU computing, parallel processing, and benchmarking.## Repository Structure
```
├── README.md
├── 2024-03-25_R_HPDA_Luebbers.pdf # Detailed insights into leveraging R for high-performance data analytics
├── NTHPDA.Rmd # R notebook containing example code and benchmarks
```
## Report Highlights- **Memory Management**: Techniques to optimize R's memory usage for handling large datasets.
- **GPU Computing**: Utilizing GPU for accelerated computations with R packages.
- **Parallel Processing**: Methods to perform parallel computations to speed up data processing tasks.
- **Benchmarking**: Evaluating the performance of various R functions and comparing them with Python.
- **Leveraging C++**: Enhancing R's performance by integrating C++ code.
- **Computational Biology**: Using R for high-performance data analysis in genomics and bioinformatics.
- **Comparative Analysis**: Evaluating R's performance against Python for various data processing tasks.## Code
To run the example scripts, you need to have R installed on your system along with the necessary packages.
You can install the required packages using the following commands:1. **Install the required packages**
```R
install.packages(c("forcats", "readr", "dplyr", "tidyr", "ggplot2", "tibble", "devtools"))
```2. **Download the data**
The data can be found here: https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/DVS/natality/Nat2018us.zip
3. **Knit the R Markdown file**
Open the `NTHPDA.Rmd` file in RStudio and click the "Knit" button to generate the HTML report.
Alternatively, you can use the following command in your R console:```R
rmarkdown::render("NTHPDA.Rmd")
```## Future Work
Making gpuR work :)
## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.## Contact
For any questions or feedback, please contact Christopher L. Lübbers at [email protected].