https://github.com/coatless-rpkg/explorecourses

Download, Extract, and Transform Course Data from Stanford University's ExploreCourses in R
https://github.com/coatless-rpkg/explorecourses

rstats stanford stanford-university web-api

Last synced: 4 months ago
JSON representation

Download, Extract, and Transform Course Data from Stanford University's ExploreCourses in R

Host: GitHub
URL: https://github.com/coatless-rpkg/explorecourses
Owner: coatless-rpkg
License: agpl-3.0
Created: 2024-10-15T22:13:34.000Z (8 months ago)
Default Branch: main
Last Pushed: 2024-11-18T11:53:43.000Z (7 months ago)
Last Synced: 2024-12-01T02:07:17.175Z (6 months ago)
Topics: rstats, stanford, stanford-university, web-api
Language: R
Homepage: https://r-pkg.thecoatlessprofessor.com/explorecourses/
Size: 8.03 MB
Stars: 2
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.Rmd
- License: LICENSE.md

Awesome Lists containing this project

README

        ---

output: github_document

---

```{r, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "100%"

)

```

# explorecourses   

[![R-CMD-check](https://github.com/coatless-rpkg/explorecourses/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/coatless-rpkg/explorecourses/actions/workflows/R-CMD-check.yaml)

> [!IMPORTANT]

> 

> This package is part of a homework exercise for STATS 290 regarding data mining

> and web APIs. 

The goal of `explorecourses` is to automatically retrieve course information from

Stanford University's [ExploreCourses](https://explorecourses.stanford.edu/) API. 

## Installation

You can install the development version of explorecourses from [GitHub](https://github.com/) with:

``` r

# install.packages("remotes")

remotes::remotes("coatless-rpkg/explorecourses")

```

## Usage

First, load the package:

```{r}

#| eval: false

library(explorecourses)

```

The package contains three main functions:

1. `fetch_all_courses()`: Fetches all courses from the ExploreCourses API for a set of departments (Default: all). 

2. `fetch_department_courses()`: Fetches the courses for a specific department.

3. `fetch_departments()`: Fetches the list of departments from the ExploreCourses API.

By default, we'll retrieve all courses across all departments for the current

academic year using:

```{r}

#| eval: false

all_courses <- fetch_all_courses()

```

We can also request specific courses for a set of departments in a given academic year. For example, to retrieve all courses for the departments of "STATS" and "MATH" for the academic year 2023-2024, we can use:

```{r}

#| eval: false

stats_and_math_courses <- fetch_all_courses(c("STATS", "MATH"), year = "20232024")

```

This function is excellent for retrieving course information across multiple departments for a given academic year as it allows for parallel processing of the data.

For a single department, we can use the `fetch_department_courses()` function to

retrieve the courses for that department in any academic year. This function's

overhead is lower as it does not support parallel processing. For example, to

retrieve all courses for the "STATS" department, we can use:

```{r}

#| eval: false

department_courses <- fetch_department_courses("STATS")

```

To determine possible department shortcodes, we can use:

```{r}

#| eval: false

departments <- fetch_departments()

```

This will return a data frame with the department short name, long name, and school

the department is associated with.

### Cache 

To cache the data, we can use the `cache_dir` parameter in the `fetch_all_courses()`,

`fetch_department_courses()`, and `fetch_departments()` functions. This

will cause the XML data downloaded from the API to be stored in the specified

directory and reused on subsequent calls.

We can list the current cache contents using the `list_cache()` function:

```{r}

#| eval: false

list_cache() # List current cache

```

```r

# Cache contents:

# 

# Found 256 cached files

# Directory: explorecourses_cache

# 

# AA ACCT AFRICAAM ALP AMELANG

# AMHRLANG AMSTUD ANES ANTHRO APPPHYS

# ARABLANG ARCHLGY ARMELANG ARTHIST ARTSINST

# ...

```

### Parallel Processing

We can speed up the process of fetching and transforming course data

by using parallel processing. For the `fetch_all_courses()` function, we've

set up parallel processing using the `furrr` package, which provides `purrr`'s

functional interface to the `future` parallel processing library.  As a result,

we will be able to download and process all courses for every department in

parallel. Moreover, we've set up progress reporting using the `progressr` 

package to track the progress of the parallel processing.

```{r}

#| eval: false

library(explorecourses)

library(future)

library(progressr)

# Set up parallel processing

plan(multisession)

# Set up progress reporting

handlers(handler_progress())

# Show progress bar for fetching all courses

with_progress({

  # Fetch all courses for the departments in parallel

  all_courses <- fetch_all_courses()

})

# Reset to sequential processing

plan(sequential)

```

Please note, we need to ensure we deactivate the `multisession` plan by resetting

it to `sequential` after we've finished using it.

## License

AGPL (>= 3)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/coatless-rpkg/explorecourses

Awesome Lists containing this project

README