{"id":13791269,"url":"https://github.com/jharner/rspark-tutorial","last_synced_at":"2025-05-12T10:31:34.721Z","repository":{"id":85451367,"uuid":"267775940","full_name":"jharner/rspark-tutorial","owner":"jharner","description":"Tutorial for learning rspark","archived":false,"fork":false,"pushed_at":"2021-01-14T06:23:51.000Z","size":20344,"stargazers_count":6,"open_issues_count":0,"forks_count":6,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-15T20:53:10.791Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jharner.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2020-05-29T05:49:17.000Z","updated_at":"2022-07-10T01:20:52.000Z","dependencies_parsed_at":null,"dependency_job_id":"65b3e148-c443-4fb2-b0f8-d3a9af229567","html_url":"https://github.com/jharner/rspark-tutorial","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jharner%2Frspark-tutorial","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jharner%2Frspark-tutorial/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jharner%2Frspark-tutorial/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jharner%2Frspark-tutorial/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jharner","download_url":"https://codeload.github.com/jharner/rspark-tutorial/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253719945,"owners_count":21952928,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T22:00:58.158Z","updated_at":"2025-05-12T10:31:33.443Z","avatar_url":"https://github.com/jharner.png","language":"R","readme":"### rspark-tutorial\n\n`rspark-tutorial` provides content illustrating the use of `rspark`, a Docker-based computing environment. `rspark` runs R, RStudio, PostgreSQL, Hadoop, Hive, and Spark in containers on your local computer (or optionally on AWS). The content covers a range of topics including many aspects of the `tidyverse`, machine learning using Spark, etc.\n\nThis tutorial is meant to run in conjunction with `rspark-docker`, which contains images of the `rspark` components. The steps for installing and launching the `rspark-docker` containers is given here:  \n\n[https://github.com/jharner/rspark-docker](https://github.com/jharner/rspark-docker)  \n\n#### Downloading the `rspark-tutorial content\n\nTo get access to the tutorial content within `rsaprk-docker`, do the following:\n\n1. Make sure `git` is installed and up to date. This step should have been done when you cloned `rspark-docker`, but from the command line you can run:\n\ngit version\n\nCompare the returned version with the current version. See the directions in the `rspark-docker` README for installing or updating, if necessary.\n\n2. Issue the following command in a terminal to clone `rspark-tutorial`:  \n\ngit clone https://github.com/jharner/rspark-tutorial.git\n\nThis will create a directory (folder) in your home directory by default containing the local `git` repo of `rspark-tutorial`. If you prefer installing `rspark-tutorial` in another directory, `cd` there first.\n\n#### Uploading the `rspark-tutorial` content to `rspark-docker`\n\nAssuming that `rspark-docker` has been started and that you have logged into the RStudio container:\n\n1. Zip the `rspark-tutorial` folder on your local computer.\n\n2. Click on the `Files` tab in the RStudio container and then click the `Upload` menu item. Navigate to the zipped version of `rspark-tutorial` and upload.  \n\n3. Execute the `.Rmd` files in the various modules/sections to generate R notebooks or R markdown output files.\n\nOnce you have cloned `rspark-tutorial`, you will be able to pull updates by issuing the following command from within the `rspark-tutorial` directory (folder):\n\n`cd rspark-docker`  \n`git pull origin master`  \n\nThe `git` pulls will keep your tutorial up to date.\n\nIf you want an excellent Git GUI, then download and install: [Sourcetree](https://www.sourcetreeapp.com).\n\n\n\n\n\n","funding_links":[],"categories":["Sparklyr Courses and Tutorials"],"sub_categories":["Presto"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjharner%2Frspark-tutorial","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjharner%2Frspark-tutorial","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjharner%2Frspark-tutorial/lists"}