https://github.com/fedebotu/iclr2023-openreviewdata
Crawl & Visualize ICLR 2023 Data from OpenReview
https://github.com/fedebotu/iclr2023-openreviewdata
crawler dataset iclr iclr2023 openreview peer-review review scraper
Last synced: 9 days ago
JSON representation
Crawl & Visualize ICLR 2023 Data from OpenReview
- Host: GitHub
- URL: https://github.com/fedebotu/iclr2023-openreviewdata
- Owner: fedebotu
- Created: 2022-11-05T02:05:37.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2023-02-10T09:20:46.000Z (over 2 years ago)
- Last Synced: 2025-06-20T16:54:32.641Z (4 months ago)
- Topics: crawler, dataset, iclr, iclr2023, openreview, peer-review, review, scraper
- Language: Jupyter Notebook
- Homepage: https://fedebotu.github.io/ICLR2023-OpenReviewData/submissions.html
- Size: 13.9 MB
- Stars: 84
- Watchers: 2
- Forks: 11
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Crawl and Visualize ICLR 2023 OpenReview Data
[](https://fedebotu.github.io/ICLR2023-OpenReviewData/submissions.html)[](https://drive.google.com/drive/folders/1wCZrwNpjBHq0mXni3xLNrlEMrGUDK-Cl?usp=sharing)
![]()
→ Open full submission list [here](https://fedebotu.github.io/ICLR2023-OpenReviewData/submissions.html)
→ Download datasets [here](https://drive.google.com/drive/folders/1wCZrwNpjBHq0mXni3xLNrlEMrGUDK-Cl?usp=sharing)## Description
This repository contains code to crawl and visualize the data from the [ICLR 2023 OpenReview](https://openreview.net/group?id=ICLR.cc/2023/Conference). Crawling is done via parallel `requests` directly to OpenReview's API, which is way faster than `selenium` - in the order of `10-100x`. It also saves datasets that can be used for further analysis, including all reviews and rebuttals and PDF files metadata and text.
## Usage
Run:
```shell
pip install -r requirements.txt
```
And run the notebooks under the `notebooks/` folder:
1. `0a. Parse data.ipynb`: crawl the data from the OpenReview website: all paper metadata (such as title, abstract, authors, etc.), reviews, and rebuttals.
2. `0b. Crawl PDF.ipynb`: parse the PDF files of the papers to extract the main text.
3. `1. Plots.ipynb`: visualize the data using word clouds, bar charts, and other plots.
4. `2. Save Website.ipynb`: save the website as a static HTML file.## Statistics
- Total submitted papers: `4874` papers
- Average rating: `4.94`### Rating Distribution
![]()
### Top 50 Keywords
![]()
### Keywords vs Ratings
![]()
### Wordcloud
![]()
### Review Lengths
![]()
### Review Lengths by Rating
![]()
### Review Lengths by Confidence
![]()
### Paper Length (pages) vs Rating
![]()
## Feedback
Feel free to open an issue or a pull request if you have any feedback or suggestions!## Acknowledgements
This repository is inspired by the following:
- Initial idea: https://github.com/evanzd/ICLR2021-OpenReviewData
- Previous year's repo: https://github.com/fedebotu/ICLR2022-OpenReviewData
- For web formatting and API requests: https://github.com/weigq/neurips2021_stats and https://github.com/weigq/iclr2022_stats