https://github.com/Alex-Fabbri/Multi-News
Large-scale multi-document summarization dataset and code
https://github.com/Alex-Fabbri/Multi-News
multi-document-summarization multi-news summarization
Last synced: 15 days ago
JSON representation
Large-scale multi-document summarization dataset and code
- Host: GitHub
- URL: https://github.com/Alex-Fabbri/Multi-News
- Owner: Alex-Fabbri
- License: other
- Created: 2019-06-04T02:01:48.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2023-05-08T15:20:16.000Z (almost 2 years ago)
- Last Synced: 2025-03-31T07:07:55.145Z (17 days ago)
- Topics: multi-document-summarization, multi-news, summarization
- Language: Python
- Size: 55.3 MB
- Stars: 283
- Watchers: 2
- Forks: 52
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
- awesome-Multi-Document-Summarization - Alex-Fabbri/Multi-News
README
# Multi-News
Data and code for the ACL 2019 paper [Multi-News: a Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model](https://128.84.21.199/pdf/1906.01749.pdf).
# Data
[Preprocessed, but not truncated, data](https://drive.google.com/open?id=1qZ3zJBv0zrUy4HVWxnx33IsrHGimXLPy)
[Preprocessed, truncated, data](https://drive.google.com/open?id=1qqSnxiaNVEctgiz2g-Wd3a9kwWuwMA07)
[Raw data](https://drive.google.com/open?id=1uDarzpu2HFc-vjXNJCRv2NIHzakpSGOw) (only replaced \n with "NEWLINE_CHAR" and appended "|||||" to the end of each story).
[Raw data, bad retrievals removed](https://drive.google.com/open?id=1jwBzXBVv8sfnFrlzPnSUBHEEAbpIUnFq) -- Removes documents retrieved with error noticed in [this issue](https://github.com/Alex-Fabbri/Multi-News/issues/11) and removes the "|||||" at the end of each example.
[Raw data -- zipped](https://drive.google.com/open?id=1vRY2wM6rlOZrf9exGTm5pXj5ExlVwJ0C)
[****Tensorflow datasets****](https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/summarization/multi_news.py)# Models and Summaries
[Trained models](https://drive.google.com/open?id=1h2xuCZXy4gev1KmsRjmBoDcSJYa5bJ4Q)
[Model output](https://drive.google.com/open?id=1yfJGKjzCi4LJyKs9u48DmIdlVdxnngbb)