Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/fedebotu/iclr2023-openreviewdata

Crawl & Visualize ICLR 2023 Data from OpenReview
https://github.com/fedebotu/iclr2023-openreviewdata

crawler dataset iclr iclr2023 openreview peer-review review scraper

Last synced: about 2 hours ago
JSON representation

Crawl & Visualize ICLR 2023 Data from OpenReview

Awesome Lists containing this project

README

        

# Crawl and Visualize ICLR 2023 OpenReview Data

[![Website](https://badgen.net/badge/Open/webpage/purple?icon=chrome)](https://fedebotu.github.io/ICLR2023-OpenReviewData/submissions.html)[![Drive](https://badgen.net/badge/Download/dataset/blue?icon=chrome)](https://drive.google.com/drive/folders/1wCZrwNpjBHq0mXni3xLNrlEMrGUDK-Cl?usp=sharing)



→ Open full submission list [here](https://fedebotu.github.io/ICLR2023-OpenReviewData/submissions.html)
→ Download datasets [here](https://drive.google.com/drive/folders/1wCZrwNpjBHq0mXni3xLNrlEMrGUDK-Cl?usp=sharing)

## Description

This repository contains code to crawl and visualize the data from the [ICLR 2023 OpenReview](https://openreview.net/group?id=ICLR.cc/2023/Conference). Crawling is done via parallel `requests` directly to OpenReview's API, which is way faster than `selenium` - in the order of `10-100x`. It also saves datasets that can be used for further analysis, including all reviews and rebuttals and PDF files metadata and text.

## Usage
Run:
```shell
pip install -r requirements.txt
```
And run the notebooks under the `notebooks/` folder:
1. `0a. Parse data.ipynb`: crawl the data from the OpenReview website: all paper metadata (such as title, abstract, authors, etc.), reviews, and rebuttals.
2. `0b. Crawl PDF.ipynb`: parse the PDF files of the papers to extract the main text.
3. `1. Plots.ipynb`: visualize the data using word clouds, bar charts, and other plots.
4. `2. Save Website.ipynb`: save the website as a static HTML file.

## Statistics
- Total submitted papers: `4874` papers
- Average rating: `4.94`

### Rating Distribution



### Top 50 Keywords



### Keywords vs Ratings



### Wordcloud



### Review Lengths



### Review Lengths by Rating



### Review Lengths by Confidence



### Paper Length (pages) vs Rating



## Feedback
Feel free to open an issue or a pull request if you have any feedback or suggestions!

## Acknowledgements
This repository is inspired by the following:
- Initial idea: https://github.com/evanzd/ICLR2021-OpenReviewData
- Previous year's repo: https://github.com/fedebotu/ICLR2022-OpenReviewData
- For web formatting and API requests: https://github.com/weigq/neurips2021_stats and https://github.com/weigq/iclr2022_stats