Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jharner/rspark-tutorial
Tutorial for learning rspark
https://github.com/jharner/rspark-tutorial
Last synced: 2 months ago
JSON representation
Tutorial for learning rspark
- Host: GitHub
- URL: https://github.com/jharner/rspark-tutorial
- Owner: jharner
- Created: 2020-05-29T05:49:17.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2021-01-14T06:23:51.000Z (about 4 years ago)
- Last Synced: 2024-08-04T22:15:24.540Z (6 months ago)
- Language: R
- Size: 19.4 MB
- Stars: 6
- Watchers: 3
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-sparklyr - rspark-tutorial: Tutorial for learning rspark
README
### rspark-tutorial
`rspark-tutorial` provides content illustrating the use of `rspark`, a Docker-based computing environment. `rspark` runs R, RStudio, PostgreSQL, Hadoop, Hive, and Spark in containers on your local computer (or optionally on AWS). The content covers a range of topics including many aspects of the `tidyverse`, machine learning using Spark, etc.
This tutorial is meant to run in conjunction with `rspark-docker`, which contains images of the `rspark` components. The steps for installing and launching the `rspark-docker` containers is given here:
[https://github.com/jharner/rspark-docker](https://github.com/jharner/rspark-docker)
#### Downloading the `rspark-tutorial content
To get access to the tutorial content within `rsaprk-docker`, do the following:
1. Make sure `git` is installed and up to date. This step should have been done when you cloned `rspark-docker`, but from the command line you can run:
git version
Compare the returned version with the current version. See the directions in the `rspark-docker` README for installing or updating, if necessary.
2. Issue the following command in a terminal to clone `rspark-tutorial`:
git clone https://github.com/jharner/rspark-tutorial.git
This will create a directory (folder) in your home directory by default containing the local `git` repo of `rspark-tutorial`. If you prefer installing `rspark-tutorial` in another directory, `cd` there first.
#### Uploading the `rspark-tutorial` content to `rspark-docker`
Assuming that `rspark-docker` has been started and that you have logged into the RStudio container:
1. Zip the `rspark-tutorial` folder on your local computer.
2. Click on the `Files` tab in the RStudio container and then click the `Upload` menu item. Navigate to the zipped version of `rspark-tutorial` and upload.
3. Execute the `.Rmd` files in the various modules/sections to generate R notebooks or R markdown output files.
Once you have cloned `rspark-tutorial`, you will be able to pull updates by issuing the following command from within the `rspark-tutorial` directory (folder):
`cd rspark-docker`
`git pull origin master`The `git` pulls will keep your tutorial up to date.
If you want an excellent Git GUI, then download and install: [Sourcetree](https://www.sourcetreeapp.com).