{"id":24601389,"url":"https://github.com/djchie/webreg_scrapy","last_synced_at":"2025-03-18T08:23:52.443Z","repository":{"id":78062879,"uuid":"42092760","full_name":"djchie/webreg_scrapy","owner":"djchie","description":"A WebReg scraper via Scrapy","archived":false,"fork":false,"pushed_at":"2015-11-22T01:36:54.000Z","size":95,"stargazers_count":2,"open_issues_count":5,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-24T14:48:49.057Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/djchie.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-09-08T05:55:38.000Z","updated_at":"2019-02-11T05:05:48.000Z","dependencies_parsed_at":"2023-03-12T03:29:19.295Z","dependency_job_id":null,"html_url":"https://github.com/djchie/webreg_scrapy","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/djchie%2Fwebreg_scrapy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/djchie%2Fwebreg_scrapy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/djchie%2Fwebreg_scrapy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/djchie%2Fwebreg_scrapy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/djchie","download_url":"https://codeload.github.com/djchie/webreg_scrapy/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244181866,"owners_count":20411730,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-01-24T14:48:57.377Z","updated_at":"2025-03-18T08:23:52.421Z","avatar_url":"https://github.com/djchie.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Webreg Scrapy\n\n\u003e This is a web scraper for retrieving UCI course information from the [UCI University Registrar](https://www.reg.uci.edu/perl/WebSoc). This is a tool I built for the [UCI Course API](https://github.com/djchie/uci-course-api).\n\n## Table of Contents\n\n1. [Usage](#usage)\n    1. [Process](#process)\n1. [Requirements](#requirements)\n1. [Development](#development)\n    1. [Installing Dependencies](#installing-dependencies)\n    1. [Running the Scraper](#running-the-scraper)\n    1. [Handling UCI Data Changes](#handling-uci-data-changes)\n    1. [Roadmap](#roadmap)\n1. [Contributing](#contributing)\n\n## Usage\n\n\u003e Use this scraper to grab course information and import it into a PostgreSQL database\n\n### Process\n\n1. Scraper is hosted on Heroku\n1. Executes the department spider to grab updated list of departments\n1. Executes a course spider for each department in department list\n1. Uploads all the information to the AWS RDS PostgreSQL database\n\n## Requirements\n\n- PostgreSQL\n\n## Development\n\n### Installing Dependencies\n\nFrom within the root directory:\n\n```sh\npip install -r requirements.txt\n```\n\n### Running the Scraper\n\nStart up PostgreSQL server with correct relations setup\n\n```sh\n// To crawl courses into database\nscrapy crawl course_scrapy  \n// To crawl courses into database and store them into courses.json\nscrapy crawl course_scrapy -o courses.json\n```\n\n### Handling UCI Data Changes\n\n1. Change items.py\n1. Change the way course_spider.py parses\n1. Change the models.py to reflect database schema\n1. Change pipelines.py to manage the insertion of new data\n\n### Roadmap\n\nView the project roadmap [here](https://github.com/djchie/webreg_scrapy/issues)\n\n## Contributing\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for contribution guidelines.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdjchie%2Fwebreg_scrapy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdjchie%2Fwebreg_scrapy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdjchie%2Fwebreg_scrapy/lists"}