{"id":22636729,"url":"https://github.com/daroczig/ceu-r-intro","last_synced_at":"2025-04-07T10:24:45.054Z","repository":{"id":139896770,"uuid":"50393945","full_name":"daroczig/CEU-R-intro","owner":"daroczig","description":"Data Analysis 1a: Foundation of Data management in R @ CEU","archived":false,"fork":false,"pushed_at":"2025-01-15T20:38:20.000Z","size":1651,"stargazers_count":42,"open_issues_count":0,"forks_count":61,"subscribers_count":16,"default_branch":"2024-winter","last_synced_at":"2025-04-01T17:15:33.265Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/daroczig.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2016-01-26T01:30:19.000Z","updated_at":"2025-01-15T20:38:22.000Z","dependencies_parsed_at":null,"dependency_job_id":"98cb6731-050f-4759-88ae-c20cdf8f67dd","html_url":"https://github.com/daroczig/CEU-R-intro","commit_stats":null,"previous_names":["daroczig/ceu-r-intro"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daroczig%2FCEU-R-intro","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daroczig%2FCEU-R-intro/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daroczig%2FCEU-R-intro/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/daroczig%2FCEU-R-intro/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/daroczig","download_url":"https://codeload.github.com/daroczig/CEU-R-intro/tar.gz/refs/heads/2024-winter","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247633134,"owners_count":20970270,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-09T03:30:00.139Z","updated_at":"2025-04-07T10:24:45.038Z","avatar_url":"https://github.com/daroczig.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"This is the R script repository of the \"[Coding 3: Introduction to R](https://ceu.studyguide.timeedit.net/modules/ECBS5155?type=CORE)\" course of the 2024/2025 Winter term, part of the [MSc in Business Analytics](https://courses.ceu.edu/programs/ms/master-science-business-analytics) at CEU. In the previous years, most of these materials were part of the \"Data Analysis 1a: Exploration\" course that you can find in the [2015/2016 Winter](https://github.com/daroczig/CEU-R-lab/tree/2016), [2016/2017 Fall](https://github.com/daroczig/CEU-R-lab/tree/2017), [2017/2018 Fall](https://github.com/daroczig/CEU-R-lab/tree/2018), [2018/2019 Fall](https://github.com/daroczig/CEU-R-lab/tree/2018-fall), and [2023/2024 Winter](https://github.com/daroczig/CEU-R-lab/tree/2023-winter) branches.\n\n## Table of Contents\n\n* [Schedule](#schedule)\n* [Syllabus](#syllabus)\n* [Technical Prerequisites](#technical-prerequisites)\n* [Class Schedule](#class-schedule)\n* [Homework](#homework)\n* [Contact](#contacts)\n\n## Schedule\n\n2 x 300 mins on Jan 8 and 15:\n\n* 13:30 - 15:10 session 1\n* 15:10 - 15:40 break\n* 15:40 - 17:20 session 2\n* 17:20 - 17:40 break\n* 17:40 - 19:20 session 3\n\n## Location\n\nIn-person at the Vienna campus (QS B-421).\n\n## Syllabus\n\nPlease find in the `syllabus` folder of this repository.\n\n## Technical Prerequisites\n\nPlease bring your own laptop* and make sure to install the below items **before** attending the first class:\n\n0. Join the Teams channel dedicated to the class at `ba-r-intro-2024` with the `o3c4ngs` team code\n1. Install `R` from https://cran.r-project.org\n2. Install `RStudio Desktop` (Open Source License) from https://posit.co/download/rstudio-desktop/\n3. Enter the following commands in the R console (bottom left panel of RStudio) and make sure you see a plot in the bottom right panel and no errors in the R console:\n\n```r\ninstall.packages('ggplot2')\nlibrary(ggplot2)\nggplot(diamonds) +\n  aes(x = price, fill = cut) +\n  geom_density(alpha = 0.5) + facet_wrap(~ color) +\n  xlab('') + ylab('') +\n  theme_bw() + theme('legend.position' = 'top') +\n  guides(fill = guide_legend(nrow = 1))\n```\n\nOptional steps I highly suggest to do as well before attending the class if you plan to use `git`:\n\n4. Register an account at https://github.com\n5. Bookmark, watch or star this repository so that you can easily find it later\n6. Install `git` from https://git-scm.com/\n7. Verify in RStudio that you can see the path of the `git` executable binary in the Tools/Global Options menu's \"Git/Svn\" tab -- if not, then you might have to restart RStudio (if you installed git after starting RStudio) or installed git by not adding that to the PATH on Windows. Either way, browse the \"git executable\" manually (in some `bin` folder look for thee `git` executable file).\n8. Create an RSA key (optionally with a passphrase for increased security -- that you have to enter every time you push and pull to and from GitHub). Copy the public key and add that to you SSH keys on your GitHub profile.\n9. Create a new project choosing \"version control\", then \"git\" and paste the SSH version of the repo URL copied from GitHub in the pop-up -- now RStudio should be able to download the repo. If it asks you to accept GitHub's fingerprint, say \"Yes\".\n10. If RStudio/git is complaining that you have to set your identity, click on the \"Git\" tab in the top-right panel, then click on the Gear icon and then \"Shell\" -- here you can set your username and e-mail address in the command line, so that RStudio/git integration can work. Use the following commands:\n\n    ```sh\n    $ git config --global user.name \"Your Name\"\n    $ git config --global user.email \"Your e-mail address\"\n    ```\n    Close this window, commit, push changes, all set.\n\nFind more resources in Jenny Bryan's \"[Happy Git and GitHub for the useR](http://happygitwithr.com/)\" tutorial if in doubt or [contact me](#contact).\n\n(*) If you may not be able to use your own laptop, there's a shared RStudio Server set up in AWS - including all the required R packages already installed for you. Look up the class Slack channel for how to access.\n\nFor the curious mind, this is how the shared RStudio Server was set up in AWS: \u003cdetails\u003e\u003csummary\u003eClick to expand ...\u003c/summary\u003e\n\n💪 Installing software similar to [`add_cranapt_noble.sh`](https://github.com/eddelbuettel/r2u/blob/master/inst/scripts/add_cranapt_noble.sh):\n\n```\n# most recent R builds\nwget -q -O- https://cloud.r-project.org/bin/linux/ubuntu/marutter_pubkey.asc | sudo tee -a /etc/apt/trusted.gpg.d/cran_ubuntu_key.asc\necho \"deb [arch=amd64] https://cloud.r-project.org/bin/linux/ubuntu noble-cran40/\" | sudo tee -a /etc/apt/sources.list.d/cran_r.list\nsudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 67C2D66C4B1D4339 51716619E084DAB9\nsudo apt update \u0026\u0026 sudo apt upgrade\nsudo apt install r-base\n# apt builds of all CRAN packages\nwget -q -O- https://eddelbuettel.github.io/r2u/assets/dirk_eddelbuettel_key.asc | sudo tee -a /etc/apt/trusted.gpg.d/cranapt_key.asc\necho \"deb [arch=amd64] https://r2u.stat.illinois.edu/ubuntu noble main\" | sudo tee -a /etc/apt/sources.list.d/cranapt.list\nsudo apt update\n# install some R packages\nsudo apt install -y \\\n  r-cran-ggplot2 r-cran-ggally r-cran-readxl \\\n  r-cran-data.table \\\n  r-cran-glue r-cran-logger \\\n  r-cran-pairsd3 r-cran-ggally \\\n  r-cran-maps r-cran-nycflights13 \\\n  r-cran-png r-cran-rpart r-cran-partykit r-cran-randomforest r-cran-pander\n# install RStudio IDE\nsudo apt install -y gdebi-core\nwget https://download2.rstudio.org/server/jammy/amd64/rstudio-server-2024.12.0-467-amd64.deb\nsudo gdebi rstudio-server-*.deb\n# never do this in prod\necho \"www-port=80\" | sudo tee -a /etc/rstudio/rserver.conf\nsudo rstudio-server restart\n```\n\n💪 Creating users\n\n```r\nsecret \u003c- 'something super secret' # e.g. digest::digest(runif(1), algo=\"sha1\")\nusers \u003c- c('list', 'of', 'users')\n\nlibrary(logger)\nlibrary(glue)\nfor (user in users) {\n\n  ## remove invalid character\n  user \u003c- sub('@.*', '', user)\n  user \u003c- sub('-', '_', user)\n  user \u003c- sub('.', '_', user, fixed = TRUE)\n  user \u003c- tolower(user)\n\n  log_info('Creating {user}')\n  system(glue(\"sudo adduser --disabled-password --quiet --gecos '' {user}\"))\n\n  log_info('Setting password for {user}')\n  system(glue(\"echo '{user}:{secret}' | sudo chpasswd\")) # note the single quotes + placement of sudo\n\n  log_info('Adding {user} to sudo group')\n  system(glue('sudo adduser {user} sudo'))\n\n}\n```\n\u003c/details\u003e\n\n## Class Schedule\n\n### Week 1 (300 min): Introduction to R\n\n* General overview of the R ecosystem: [slides](https://bit.ly/CEU-R-intro-2024)\n* Basic math operations: [1.R](1.R#L1)\n* Fun math ops with 2025: [1.R](1.R#L12)\n* Numbers, strings, vectors, constants, variables: [1.R](1.R#L44)\n* Functions: [1.R](1.R#L86)\n* Basic plots: [1.R](1.R#L105)\n* Basic stats: [1.R](1.R#L176)\n* Intro to data frames: [1.R](1.R#L207)\n* Introduction to data visualization with `ggplot2`: [1.R](1.R#L284)\n* Introduction to `data.table`: [1.R](1.R#L363)\n\nSuggested reading: [Hadley Wickham: Style guide. In *Advanced R*.](http://adv-r.had.co.nz/Style.html)\n\n### Week 2 (300 min): Modeling and reporting\n\nHomework solutions: [R script](homework.R) and [R markdown](homework.Rmd)\n\n* Warm-up exercise and security reminder: [2.R](2.R#L1)\n* `data.table`: [2.R](2.R#L35)\n* Introduction to modeling: [2.R](2.R#L269)\n\nExtra example R Markdown: [pca.Rmd](pca.Rmd)\n\n## Homework\n\nLoad the `flights` dataset from the `nycflights13` package as a `data.table` object:\n\n```r\nlibrary(data.table)\nlibrary(nycflights13)\ndt \u003c- data.table(flights)\n```\n\nThen answer the below questions with the appropriate data transformations and/or visualizations:\n\n1. How many flights originated from JFK?\n2. Count the number of flights per month.\n3. Visualize the number of flights per destination.\n4. Count the number of flights with an arrival delay of more than 100 mins.\n5. Visualize the maximum arrival delay per destination.\n6. Aggregate the min and max arrival delay per origin.\n7. Visualize the distribution of the arrival delay per origin.\n8. Visualize the distribution of the arrival delay per destination.\n9. List the top 5 destinations being the furthest from NYC!\n10. How many flights were scheduled to departure before 11 am?\n\nSubmission format: create an R script with the above questions added\nas comments (using the `#` symbol at the beginning of the line),\nfollowed by R commands to answer those questions. The R script should\nbe able to run without any errors. Upload to Moodle.\n\nDeadline: Jan 15, 2025\n\n## Final project\n\nUse any publicly accessible dataset (preferably from the TidyTuesday projects at https://github.com/rfordatascience/tidytuesday, but if you don't feel creative, feel free to default to using the `diamonds` from the `ggplot2` package) and do data transformations that seems useful, optionally merge external datasets, generate data visualizations that makes sense and are insightful, plus provide comments on those in plain English.\n\nSubmission: prepare an R markdown document that includes plain English text description of the dataset, problems/questions you analyzed, actual R code chunks (printing both the code and its output) loading the data, doing the analysis, comments and summary/conclusion of the results, and knit the Rmd to HTML, then upload both the Rmd and the HTML to Moodle before Jan 31, 2025 midnight (CET).\n\nRequired items:\n\n* filtering rows using `data.table`,\n* aggregating data using `data.table`,\n* at least 7 plots using at least 3 different `ggplot2` geoms (e.g. a scatterplot, boxplot, barchart etc.)\n\nThe above items with proper homework solutions from the first week will result in \"B\" grade.\n\nFor \"A\", please also work on the below extra items:\n\n* merge datasets,\n* apply a theme,\n* define the axis and plot titles,\n* use a color palette from \u003ccolorbrewer2.org\u003e,\n* use multiple geom layers on the same plot,\n* publish your results on RPubs.com/Medium (look at the \"Publish\" option in the \"File\" menu).\n\n## Contact\n\nFile a [GitHub ticket](https://github.com/daroczig/CEU-R-intro/issues).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdaroczig%2Fceu-r-intro","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdaroczig%2Fceu-r-intro","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdaroczig%2Fceu-r-intro/lists"}