https://github.com/samzhang02/mcgill-course-scraper

Scraper written in Python that scrapes all courses from McGill University and their relevant information
https://github.com/samzhang02/mcgill-course-scraper

Last synced: about 1 year ago
JSON representation

Scraper written in Python that scrapes all courses from McGill University and their relevant information

Host: GitHub
URL: https://github.com/samzhang02/mcgill-course-scraper
Owner: SamZhang02
License: mit
Created: 2023-01-25T21:40:23.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2023-02-08T11:39:37.000Z (over 3 years ago)
Last Synced: 2025-03-26T23:37:08.952Z (about 1 year ago)
Language: Python
Size: 27.3 KB
Stars: 1
Watchers: 1
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# McGill-Course-Scraper
Scraper written in Python that scrapes all courses from McGill University and their relevant information.

Only valid for the 2022-2023 school year for now.

---

"This project is **not** affiliated, endorsed, or vetted by McGill University. It is an open-source tool that uses publicly available information from the university and is intended for research and educational purposes only. Please refer to McGill University's terms of use for details on your rights to use the information downloaded. Remember - the information provided is intended for personal use only."

---

## News
Version 0.2:
- Added multithreading to speed-up individual page scrapings.

## Requirements
```
pip install -r requirements.txt
```

## Usage
MacOS
```
python3 src/main.py --num-threads= [default: 10]
```
Windows
```
py src/main.py --num-threads= [default: 10]
```

The program starts by scraping the URL of all courses on McGill University's official website and storing them in a `.txt` in `/output`. This should take a few minutes.

The program then requests each URL in the file and parses the individual pages one by one, with 10 threads by default. This should take under 5 min, but feel free to change the number of threads in `main.py` to slow the requests down out of politeness. The process status will be printed out in the terminal as the program executes.

McGill's website does appear to rate limit, so don't set the number of threads too high.

The output will be stored in `/output/courses.json`. See `/docs/structure.json` for a miniature example of what the file will look like.

## Contributing
Fork the repo and open a PR to `/main` with the appropriate title and descriptions.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/samzhang02/mcgill-course-scraper

Awesome Lists containing this project

README