{"id":15497060,"url":"https://github.com/zaneh/ocw-crawler","last_synced_at":"2026-05-28T20:30:58.195Z","repository":{"id":183379462,"uuid":"670003218","full_name":"ZaneH/ocw-crawler","owner":"ZaneH","description":"Crawl MIT OpenCourseWare courses with Kimurai. Not affiliated.","archived":false,"fork":false,"pushed_at":"2023-11-28T04:54:17.000Z","size":67,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-01-15T22:26:40.749Z","etag":null,"topics":["crawler","kimurai","mit","ocw","opencourseware","spider"],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ZaneH.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-07-24T04:48:37.000Z","updated_at":"2023-11-28T04:33:48.000Z","dependencies_parsed_at":"2023-07-24T08:26:00.258Z","dependency_job_id":"d270e4e9-90a3-41c6-b6cd-9cfa730cca9f","html_url":"https://github.com/ZaneH/ocw-crawler","commit_stats":null,"previous_names":["zaneh/ocw-crawler"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZaneH%2Focw-crawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZaneH%2Focw-crawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZaneH%2Focw-crawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZaneH%2Focw-crawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ZaneH","download_url":"https://codeload.github.com/ZaneH/ocw-crawler/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241997528,"owners_count":20055118,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","kimurai","mit","ocw","opencourseware","spider"],"created_at":"2024-10-02T08:30:23.308Z","updated_at":"2026-05-28T20:30:58.172Z","avatar_url":"https://github.com/ZaneH.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MIT OpenCourseWare Crawler\n\n## Crawl Output\n\n**Last updated**: November 27, 2023\n\n- OCW Video Lectures: [results.csv](https://github.com/ZaneH/ocw-crawler/blob/main/results.csv)\n\n## Description\n\nThis is a simple crawler to save the available courses on [MIT OpenCourseWare](https://ocw.mit.edu/). This crawler will export the courses with video lectures as a CSV file.\n\nYou can crawl for courses other than video lectures by changing the `@start_urls` in `crawler.rb`.\n\n## Docker Run (Recommended)\n\nThis is the simplest way to run the crawler. It will run the crawler and save the results in `results.csv` using a Docker volume. \n\n```bash\n$ docker build -t ocw-crawl:1.0 .\n$ docker run --volume $(pwd)/results.csv:/app/results.csv \\\n             --rm \\\n             --name ocw-crawl \\\n             ocw-crawl:1.0\n```\n\n---\n\n## Manually Run\n\nTo run the crawler without Docker, you'll need to install an older version of Ruby that's compatible with `kimurai`. You'll also need `geckodriver` and Firefox. Read more about setting up `kimurai` [here](https://github.com/vifreefly/kimuraframework#installation) if you run into trouble.\n\n### Setup\n\nInstall Ruby 2.5.0 and run `bundle install`.\n\n```bash\n$ asdf install ruby 2.5.0\n$ asdf global ruby 2.5.0\n$ gem install bundler\n$ bundle install # install dependencies\n```\n\n### Run\n\n```bash\n$ ruby crawler.rb\n...\n```\n\n## Possible Improvements\n\n- Use [OCW Sitemaps](https://ocw.mit.edu/sitemap.xml) to crawl all courses\n- Get more information about each course from the sitemap\n    - Course materials often follow these patterns:\n        - Syllabus: `/pages/syllabus/`\n        - Course download: `/download/`\n        - Resources: `/resources/*/`\n            - PDFs, slides, lectures notes, etc.\n        - Course pages: `/pages/*/`\n            - Readings: `/pages/readings/`\n- Turn the data into an app or API","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzaneh%2Focw-crawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzaneh%2Focw-crawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzaneh%2Focw-crawler/lists"}