{"id":34628102,"url":"https://github.com/pranftw/openreview_scraper","last_synced_at":"2025-12-24T16:14:41.561Z","repository":{"id":131859622,"uuid":"599420392","full_name":"pranftw/openreview_scraper","owner":"pranftw","description":"Scrape papers from OpenReview using OpenReview API","archived":false,"fork":false,"pushed_at":"2025-03-03T09:57:48.000Z","size":31,"stargazers_count":29,"open_issues_count":1,"forks_count":7,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-03T10:35:39.854Z","etag":null,"topics":["automation","literature-review","openreview","python","research-paper","research-tool","scraper"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pranftw.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-02-09T05:03:38.000Z","updated_at":"2025-03-03T09:57:52.000Z","dependencies_parsed_at":"2025-03-03T10:40:10.402Z","dependency_job_id":null,"html_url":"https://github.com/pranftw/openreview_scraper","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/pranftw/openreview_scraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pranftw%2Fopenreview_scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pranftw%2Fopenreview_scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pranftw%2Fopenreview_scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pranftw%2Fopenreview_scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pranftw","download_url":"https://codeload.github.com/pranftw/openreview_scraper/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pranftw%2Fopenreview_scraper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28004520,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-12-24T02:00:07.193Z","response_time":83,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automation","literature-review","openreview","python","research-paper","research-tool","scraper"],"created_at":"2025-12-24T16:14:40.551Z","updated_at":"2025-12-24T16:14:41.551Z","avatar_url":"https://github.com/pranftw.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# OpenReview Scraper\nScrape papers from top conferences like ICML, ICLR, NeurIPS, etc using OpenReview API, by searching for specific keywords in title, abstract or keywords in the submissions and save them to a CSV file.\u003cbr\u003e\nBrings down the time taken to gather papers from several hours to a few minutes through automation\n\n## Installation\n```python\ngit clone https://github.com/pranftw/openreview_scraper.git # clone repo\npython -m venv venv # create virtual environment\nsource venv/bin/activate # activate virtual environment\npip install -r requirements.txt # install requirements\ncp config.py.example config.py # enter your OpenReview credentials in config.py\n```\n\n## Example\n```python\nfrom scraper import Scraper\nfrom extract import Extractor\nfrom filters import title_filter, keywords_filter, abstract_filter\nfrom selector import Selector\nfrom utils import save_papers, load_papers\n\n\nyears = [\n    '2024'\n]\nconferences = [\n    'ICLR'\n]\nkeywords = [\n    'generalization'\n]\n\ndef modify_paper(paper):\n  paper.forum = f\"https://openreview.net/forum?id={paper.forum}\"\n  paper.content['pdf'] = f\"https://openreview.net{paper.content['pdf']}\"\n  return paper\n\n# what fields to extract\nextractor = Extractor(fields=['forum'], subfields={'content':['title', 'keywords', 'abstract', 'pdf', 'match']})\n\n# if you want to select papers manually among the scraped papers\n# selector = Selector()\n\n# select all scraped papers\nselector = None\n\nscraper = Scraper(conferences=conferences, years=years, keywords=keywords, extractor=extractor, fpath='example.csv', fns=[modify_paper], selector=selector)\n\n# adding filters to filter on\nscraper.add_filter(title_filter)\nscraper.add_filter(keywords_filter)\nscraper.add_filter(abstract_filter)\n\nscraper()\n\n# if you want to save scraped papers as OpenReview objects using pickle\nsave_papers(scraper.papers, fpath='papers.pkl')\nsaved_papers = load_papers(fpath='papers.pkl')\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpranftw%2Fopenreview_scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpranftw%2Fopenreview_scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpranftw%2Fopenreview_scraper/lists"}