Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/fedebotu/iclr2023-openreviewdata
Crawl & Visualize ICLR 2023 Data from OpenReview
https://github.com/fedebotu/iclr2023-openreviewdata
crawler dataset iclr iclr2023 openreview peer-review review scraper
Last synced: about 2 hours ago
JSON representation
Crawl & Visualize ICLR 2023 Data from OpenReview
- Host: GitHub
- URL: https://github.com/fedebotu/iclr2023-openreviewdata
- Owner: fedebotu
- Created: 2022-11-05T02:05:37.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-02-10T09:20:46.000Z (over 1 year ago)
- Last Synced: 2023-03-04T22:57:50.091Z (over 1 year ago)
- Topics: crawler, dataset, iclr, iclr2023, openreview, peer-review, review, scraper
- Language: Jupyter Notebook
- Homepage: https://fedebotu.github.io/ICLR2023-OpenReviewData/submissions.html
- Size: 13.9 MB
- Stars: 29
- Watchers: 2
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Crawl and Visualize ICLR 2023 OpenReview Data
[![Website](https://badgen.net/badge/Open/webpage/purple?icon=chrome)](https://fedebotu.github.io/ICLR2023-OpenReviewData/submissions.html)[![Drive](https://badgen.net/badge/Download/dataset/blue?icon=chrome)](https://drive.google.com/drive/folders/1wCZrwNpjBHq0mXni3xLNrlEMrGUDK-Cl?usp=sharing)
→ Open full submission list [here](https://fedebotu.github.io/ICLR2023-OpenReviewData/submissions.html)
→ Download datasets [here](https://drive.google.com/drive/folders/1wCZrwNpjBHq0mXni3xLNrlEMrGUDK-Cl?usp=sharing)## Description
This repository contains code to crawl and visualize the data from the [ICLR 2023 OpenReview](https://openreview.net/group?id=ICLR.cc/2023/Conference). Crawling is done via parallel `requests` directly to OpenReview's API, which is way faster than `selenium` - in the order of `10-100x`. It also saves datasets that can be used for further analysis, including all reviews and rebuttals and PDF files metadata and text.
## Usage
Run:
```shell
pip install -r requirements.txt
```
And run the notebooks under the `notebooks/` folder:
1. `0a. Parse data.ipynb`: crawl the data from the OpenReview website: all paper metadata (such as title, abstract, authors, etc.), reviews, and rebuttals.
2. `0b. Crawl PDF.ipynb`: parse the PDF files of the papers to extract the main text.
3. `1. Plots.ipynb`: visualize the data using word clouds, bar charts, and other plots.
4. `2. Save Website.ipynb`: save the website as a static HTML file.## Statistics
- Total submitted papers: `4874` papers
- Average rating: `4.94`### Rating Distribution
### Top 50 Keywords
### Keywords vs Ratings
### Wordcloud
### Review Lengths
### Review Lengths by Rating
### Review Lengths by Confidence
### Paper Length (pages) vs Rating
## Feedback
Feel free to open an issue or a pull request if you have any feedback or suggestions!## Acknowledgements
This repository is inspired by the following:
- Initial idea: https://github.com/evanzd/ICLR2021-OpenReviewData
- Previous year's repo: https://github.com/fedebotu/ICLR2022-OpenReviewData
- For web formatting and API requests: https://github.com/weigq/neurips2021_stats and https://github.com/weigq/iclr2022_stats