https://github.com/terror/mcgill.courses

A course search and review platform for McGill University
https://github.com/terror/mcgill.courses

Last synced: 6 months ago
JSON representation

A course search and review platform for McGill University

Host: GitHub
URL: https://github.com/terror/mcgill.courses
Owner: terror
License: cc0-1.0
Created: 2023-03-08T05:18:41.000Z (almost 3 years ago)
Default Branch: master
Last Pushed: 2025-02-02T00:23:03.000Z (11 months ago)
Last Synced: 2025-03-14T22:03:18.271Z (10 months ago)
Language: TypeScript
Homepage: https://mcgill.courses
Size: 34.3 MB
Stars: 41
Watchers: 4
Forks: 8
Open Issues: 62
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING
- License: LICENSE

Awesome Lists containing this project

README

## mcgill.courses

A course search and review platform for McGill university.

![](https://github.com/user-attachments/assets/8807c010-6d85-4b3a-9199-4cc101417519)

## Development

You'll need [docker](https://www.docker.com/),
[cargo](https://doc.rust-lang.org/cargo/) and [pnpm](https://pnpm.io/) installed
on your machine to spawn the various components the project needs to run
locally.

First, join the discord server to get access to the development environment
variables:

[https://discord.gg/fSVSqfPHSV](https://discord.gg/fSVSqfPHSV)

In `.env` within the root directory you'll have to set

```
MS_CLIENT_ID=
MS_CLIENT_SECRET=
MS_REDIRECT_URI=http://localhost:8000/api/auth/authorized
```

...and then in `client/.env` you'll have to set the server url

```
VITE_API_URL=http://localhost:8000
```

Second, mount a local [mongodb](https://www.mongodb.com/) instance with docker
and initiate the replica set:

```bash
docker compose up --no-recreate -d
sleep 5
docker exec mongodb mongosh --quiet --eval 'rs.initiate()' > /dev/null 2>&1 || true
```

Spawn the server with a data source (in this case the `/seed` directory) and
initialize the database (note that seeding may take some time on slower
machines):

```bash
cargo run -- --source=seed serve --initialize --db-name=mcgill-courses
```

Finally, spawn the react frontend:

```bash
pnpm install
pnpm run dev
```

_n.b._ If you have [just](https://github.com/casey/just) installed, we provide a
`dev` recipe for doing all of the above in addition to running a watch on the
server:

```bash
just dev
```

See the
[justfile](https://github.com/terror/mcgill.courses/blob/master/justfile) for
more recipes.

### Gathering seed data

The server command-line interface provides a load subcommand for scraping all
courses from various McGill course information websites and building a JSON data
source, for example:

```
RUST_LOG=info cargo run -- --source=seed \
load \
--batch-size=200 \
--scrape-vsb \
--user-agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36"
```

The current defaults include scraping all current 10,000+ courses offered by
McGill, current schedule information from the official
[visual schedule builder](https://vsb.mcgill.ca), and courses offered in
previous terms going back as far the 2009-2010 term.

For full usage information, see the output below:

```present just run load --help
Usage: server load [OPTIONS] --user-agent

Options:
--user-agent A user agent
--course-delay Time delay between course requests in milliseconds [default: 0]
--retries Number of retries [default: 10]
--batch-size Number of pages to scrape per concurrent batch [default: 20]
--mcgill-terms The mcgill terms to scrape [default: 2009-2010 2010-2011 2011-2012 2012-2013 2013-2014 2014-2015 2015-2016 2016-2017 2017-2018 2018-2019 2019-2020 2020-2021 2021-2022 2022-2023 2023-2024 2024-2025]
--vsb-terms The schedule builder terms to scrape [default: 202405 202409 202501]
--scrape-vsb Scrape visual schedule builder information
-h, --help Print help
```

Alternatively, if you have [just](https://github.com/casey/just) installed, you
can run:

```
just load
```

## Tools

We have a few tools that we use throughout the project, below documents some of
them. You can find them all under the
[`/tools`](https://github.com/terror/mcgill.courses/tree/master/tools) directory
from the project root.

For python-based tools, we highly recommend you install
[uv](https://docs.astral.sh/uv/) on your system. On macOS or linux, you can do
it as follows:

```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```

Follow the
[documentation](https://docs.astral.sh/uv/getting-started/installation/) for
other systems.

### `changelog-generator`

Our changelog page
([https://mcgill.courses/changelog](https://mcgill.courses/changelog)) is
automated by this tool.

We feed PR titles and descriptions to a large language model (in this case
hard-coded to GPT-3.5) to generate a user-friendly summary using
[this prompt](https://github.com/terror/mcgill.courses/blob/master/tools/changelog-generator/prompt.txt).

The tool assumes you have an [OpenAI](https://openai.com/) API key set in the
environment, and you can use it from the project root like:

```bash
cargo run --manifest-path tools/changelog-generator/Cargo.toml \
-- \
--output client/src/assets/changelog.json
```

This will run the changelog generator on all
[up-to-date merged PRs](https://github.com/terror/mcgill.courses/pulls?q=is:pr+is:closed)
from our GitHub repository, populating
[`changelog.json`](https://github.com/terror/mcgill.courses/blob/master/client/src/assets/changelog.json)
with the results.

There are a few other options the tool supports:

```present cargo run --manifest-path tools/changelog-generator/Cargo.toml -- --help
Usage: changelog-generator [OPTIONS]

Options:
--output [default: ../../client/src/assets/changelog.json]
--regenerate [...]
--regenerate-all
--repo [default: mcgill.courses]
--user [default: terror]
-h, --help Print help
```

For instance, you can regenerate single entries by specifying their pull request
number.

### `course-average-fetcher`

This tool is used to populate a
[JSON file](https://github.com/terror/mcgill.courses/blob/master/client/src/assets/courseAveragesData.json)
with course average information we display on
[course pages](https://mcgill.courses/course/econ208).

We read and parse a
[crowdsourced google sheet](https://docs.google.com/spreadsheets/d/1NGUBQuF8FI6ebna86S1RHpc27srxpMbaSyjipIkr-gk/edit?gid=233834959#gid=233834959)
with historical course averages provided generously by the
[McGill enhanced](https://demetrios-koziris.github.io/McGillEnhanced/) team.

### `requirement-parser`

We parse prerequisites and corequisites using a fine-tuned large language model
with custom examples, all the code lives in
[`/tools/requirement-parser`](https://github.com/terror/mcgill.courses/tree/master/tools/req-parser).

If you need to run the requirement parser on a file, simply:

```bash
cd tools/requirement-parser
uv sync
uv run main.py
```

_n.b._ This will require an [OpenAI](https://openai.com/) API key and the name
of the fine-tuned model to be set in the environment.

For more information about how this works, check out our
[research project](https://github.com/SamZhang02/llmbda).

### `search-index-aggregator`

This tool selectively includes only the
[JSON fields](https://github.com/terror/mcgill.courses/blob/master/client/src/assets/searchData.json)
(from database
[seed files](https://github.com/terror/mcgill.courses/tree/master/seed))
required by the search component, significantly reducing payload size and
improving resource efficiency.

## Deployment

We continuously deploy our site with [Render](https://render.com/) using a
[docker image](https://github.com/terror/mcgill.courses/blob/master/Dockerfile),
and have a [MongoDB](https://en.wikipedia.org/wiki/MongoDB?useskin=vector)
instance hosted on [Atlas](https://www.mongodb.com/atlas/database).

We also use
[S3](https://aws.amazon.com/pm/serv-s3/?trk=936e5692-d2c9-4e52-a837-088366a7ac3f&sc_channel=ps)
to host a bucket for referring to a hash when deciding whether or not to seed
courses in our production environment, and Microsoft's
[identity platform](https://learn.microsoft.com/en-us/entra/identity-platform/v2-oauth2-auth-code-flow)
for handling our OAuth 2.0
[authentication flow](https://github.com/terror/mcgill.courses/blob/master/src/auth.rs).

## Prior Art

There are a few notable projects worth mentioning that are similar in nature to
[mcgill.courses](https://mcgill.courses), and have either led to inspiration or
new ideas with regard to its functionality and design, namely:

- [uwflow.com](https://uwflow.com/) - A course search and review platform for
the University of Waterloo
- [cloudberry.fyi](https://www.cloudberry.fyi/) - A post-modern schedule builder
for McGill students
- [mcgill.wtf](https://github.com/terror/mcgill.wtf) - A fast full-text search
engine for McGill courses

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/terror/mcgill.courses

Awesome Lists containing this project

README