Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/MengtingWan/goodreads
code samples for the goodreads datasets
https://github.com/MengtingWan/goodreads
book-reviews computational-social-science dataset machine-learning natural-language-processing recommendation-system recommender-system research spoilers
Last synced: 2 months ago
JSON representation
code samples for the goodreads datasets
- Host: GitHub
- URL: https://github.com/MengtingWan/goodreads
- Owner: MengtingWan
- License: apache-2.0
- Created: 2019-05-23T06:17:08.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2023-05-29T22:12:34.000Z (over 1 year ago)
- Last Synced: 2024-08-03T18:14:57.847Z (6 months ago)
- Topics: book-reviews, computational-social-science, dataset, machine-learning, natural-language-processing, recommendation-system, recommender-system, research, spoilers
- Language: Jupyter Notebook
- Homepage: https://mengtingwan.github.io/data/goodreads.html
- Size: 74.2 KB
- Stars: 236
- Watchers: 3
- Forks: 58
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-libgen - Goodreads Metadata - Goodreads meta-data and book reviews on 2,360,655 books (eBook metadata)
README
# [Goodreads Datasets](https://mengtingwan.github.io/data/goodreads.html)
#### NOTE: Our datasets have been moved! Please see our new [webpage](https://mengtingwan.github.io/data/goodreads.html) about how to download these datasets.
The datasets were collected in late 2017 from [goodreads](https://goodreads.com). Details of the datasets are described in the [dataset website](https://mengtingwan.github.io/data/goodreads.html)
**We collected these datasets for academic use only! Please do not redistribute them or use for commercial purposes.**
## Citations
If you are using our dataset, please cite the following papers:- Mengting Wan, Julian McAuley, "[Item Recommendation on Monotonic Behavior Chains](https://github.com/MengtingWan/mengtingwan.github.io/raw/master/paper/recsys18_mwan.pdf)", in RecSys'18. [[bibtex](https://dblp.uni-trier.de/rec/bibtex/conf/recsys/WanM18)]
- Mengting Wan, Rishabh Misra, Ndapa Nakashole, Julian McAuley, "[Fine-Grained Spoiler Detection from Large-Scale Review Corpora](https://github.com/MengtingWan/mengtingwan.github.io/raw/master/paper/acl19_mwan.pdf)", in ACL'19. [[bibtex](https://dblp.uni-trier.de/rec/bibtex/conf/acl/WanMNM19)]## Notebooks/Code Samples
We've created several notebooks (in python 3.7) to illustrate how to download/read these datasets, and provide some basic explorations of the data.
- [download.ipynb](/download.ipynb): If you prefer to download datasets without GUI. This notebook will show how to download files in bash/python.
- [samples.ipynb](/samples.ipynb): This notebook will show how to read '.json.gz' files line-by-line and display sample records of each file.
- [statistics.ipynb](/statistics.ipynb): This notebook will calculate some basic statistics of the datasets (except the largest complete interaction file 'goodreads_interactions.csv'). Running this notebook may take a while.
- [distributions.ipynb](/distributions.ipynb): This notebook will operate on the complete interaction file 'goodreads_interactions.csv' and provide some explorations of the distributions of these interactions. **Note: Run this notebook only when you have LARGE memory (recommend 32g+)!!**
- [reviews.ipynb](/reviews.ipynb): This notebook will calculate some statistics of the review datasets.