{"id":17767188,"url":"https://github.com/samzhang02/mcgill-course-scraper","last_synced_at":"2025-04-01T14:18:42.341Z","repository":{"id":91795440,"uuid":"593382867","full_name":"SamZhang02/mcgill-course-scraper","owner":"SamZhang02","description":"Scraper written in Python that scrapes all courses from McGill University and their relevant information","archived":false,"fork":false,"pushed_at":"2023-02-08T11:39:37.000Z","size":28,"stargazers_count":1,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-26T23:37:08.952Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SamZhang02.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-01-25T21:40:23.000Z","updated_at":"2023-02-06T18:37:07.000Z","dependencies_parsed_at":null,"dependency_job_id":"5987acf8-3230-4ac2-950c-b05735e5f655","html_url":"https://github.com/SamZhang02/mcgill-course-scraper","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SamZhang02%2Fmcgill-course-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SamZhang02%2Fmcgill-course-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SamZhang02%2Fmcgill-course-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SamZhang02%2Fmcgill-course-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SamZhang02","download_url":"https://codeload.github.com/SamZhang02/mcgill-course-scraper/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246651559,"owners_count":20811994,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-26T20:43:22.889Z","updated_at":"2025-04-01T14:18:42.312Z","avatar_url":"https://github.com/SamZhang02.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# McGill-Course-Scraper\nScraper written in Python that scrapes all courses from McGill University and their relevant information.\n\nOnly valid for the 2022-2023 school year for now.\n\n---\n\n\"This project is **not** affiliated, endorsed, or vetted by McGill University. It is an open-source tool that uses publicly available information from the university and is intended for research and educational purposes only. Please refer to McGill University's terms of use for details on your rights to use the information downloaded. Remember - the information provided is intended for personal use only.\"\n\n---\n\n## News\nVersion 0.2:\n- Added multithreading to speed-up individual page scrapings.\n\n## Requirements\n```\npip install -r requirements.txt\n```\n\n## Usage\nMacOS\n```\npython3 src/main.py --num-threads=\u003cint\u003e [default: 10]\n```\nWindows\n```\npy src/main.py --num-threads=\u003cint\u003e [default: 10]\n```\n\nThe program starts by scraping the URL of all courses on McGill University's official website and storing them in a `.txt` in `/output`. This should take a few minutes.\n\nThe program then requests each URL in the file and parses the individual pages one by one, with 10 threads by default. This should take under 5 min, but feel free to change the number of threads in `main.py` to slow the requests down out of politeness. The process status will be printed out in the terminal as the program executes.\n\nMcGill's website does appear to rate limit, so don't set the number of threads too high.\n\nThe output will be stored in `/output/courses.json`. See `/docs/structure.json` for a miniature example of what the file will look like.\n\n## Contributing\nFork the repo and open a PR to `/main` with the appropriate title and descriptions.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsamzhang02%2Fmcgill-course-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsamzhang02%2Fmcgill-course-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsamzhang02%2Fmcgill-course-scraper/lists"}