https://github.com/fedebotu/iclr2023-openreviewdata

Crawl & Visualize ICLR 2023 Data from OpenReview
https://github.com/fedebotu/iclr2023-openreviewdata

crawler dataset iclr iclr2023 openreview peer-review review scraper

Last synced: 9 days ago
JSON representation

Crawl & Visualize ICLR 2023 Data from OpenReview

Host: GitHub
URL: https://github.com/fedebotu/iclr2023-openreviewdata
Owner: fedebotu
Created: 2022-11-05T02:05:37.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2023-02-10T09:20:46.000Z (over 2 years ago)
Last Synced: 2025-06-20T16:54:32.641Z (4 months ago)
Topics: crawler, dataset, iclr, iclr2023, openreview, peer-review, review, scraper
Language: Jupyter Notebook
Homepage: https://fedebotu.github.io/ICLR2023-OpenReviewData/submissions.html
Size: 13.9 MB
Stars: 84
Watchers: 2
Forks: 11
Open Issues: 2
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Crawl and Visualize ICLR 2023 OpenReview Data

[![Website](https://badgen.net/badge/Open/webpage/purple?icon=chrome)](https://fedebotu.github.io/ICLR2023-OpenReviewData/submissions.html)[![Drive](https://badgen.net/badge/Download/dataset/blue?icon=chrome)](https://drive.google.com/drive/folders/1wCZrwNpjBHq0mXni3xLNrlEMrGUDK-Cl?usp=sharing)

→ Open full submission list [here](https://fedebotu.github.io/ICLR2023-OpenReviewData/submissions.html)
→ Download datasets [here](https://drive.google.com/drive/folders/1wCZrwNpjBHq0mXni3xLNrlEMrGUDK-Cl?usp=sharing)

## Description

This repository contains code to crawl and visualize the data from the [ICLR 2023 OpenReview](https://openreview.net/group?id=ICLR.cc/2023/Conference). Crawling is done via parallel `requests` directly to OpenReview's API, which is way faster than `selenium` - in the order of `10-100x`. It also saves datasets that can be used for further analysis, including all reviews and rebuttals and PDF files metadata and text.

## Usage
Run:
```shell
pip install -r requirements.txt
```
And run the notebooks under the `notebooks/` folder:
1. `0a. Parse data.ipynb`: crawl the data from the OpenReview website: all paper metadata (such as title, abstract, authors, etc.), reviews, and rebuttals.
2. `0b. Crawl PDF.ipynb`: parse the PDF files of the papers to extract the main text.
3. `1. Plots.ipynb`: visualize the data using word clouds, bar charts, and other plots.
4. `2. Save Website.ipynb`: save the website as a static HTML file.

## Statistics
- Total submitted papers: `4874` papers
- Average rating: `4.94`

### Rating Distribution

### Top 50 Keywords

### Keywords vs Ratings

### Wordcloud

### Review Lengths

### Review Lengths by Rating

### Review Lengths by Confidence

### Paper Length (pages) vs Rating

## Feedback
Feel free to open an issue or a pull request if you have any feedback or suggestions!

## Acknowledgements
This repository is inspired by the following:
- Initial idea: https://github.com/evanzd/ICLR2021-OpenReviewData
- Previous year's repo: https://github.com/fedebotu/ICLR2022-OpenReviewData
- For web formatting and API requests: https://github.com/weigq/neurips2021_stats and https://github.com/weigq/iclr2022_stats

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/fedebotu/iclr2023-openreviewdata

Awesome Lists containing this project

README