An open API service indexing awesome lists of open source software.

https://github.com/Alex-Fabbri/Multi-News

Large-scale multi-document summarization dataset and code
https://github.com/Alex-Fabbri/Multi-News

multi-document-summarization multi-news summarization

Last synced: 15 days ago
JSON representation

Large-scale multi-document summarization dataset and code

Awesome Lists containing this project

README

        

# Multi-News

Data and code for the ACL 2019 paper [Multi-News: a Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model](https://128.84.21.199/pdf/1906.01749.pdf).

# Data
[Preprocessed, but not truncated, data](https://drive.google.com/open?id=1qZ3zJBv0zrUy4HVWxnx33IsrHGimXLPy)
[Preprocessed, truncated, data](https://drive.google.com/open?id=1qqSnxiaNVEctgiz2g-Wd3a9kwWuwMA07)
[Raw data](https://drive.google.com/open?id=1uDarzpu2HFc-vjXNJCRv2NIHzakpSGOw) (only replaced \n with "NEWLINE_CHAR" and appended "|||||" to the end of each story).
[Raw data, bad retrievals removed](https://drive.google.com/open?id=1jwBzXBVv8sfnFrlzPnSUBHEEAbpIUnFq) -- Removes documents retrieved with error noticed in [this issue](https://github.com/Alex-Fabbri/Multi-News/issues/11) and removes the "|||||" at the end of each example.
[Raw data -- zipped](https://drive.google.com/open?id=1vRY2wM6rlOZrf9exGTm5pXj5ExlVwJ0C)
[****Tensorflow datasets****](https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/summarization/multi_news.py)

# Models and Summaries
[Trained models](https://drive.google.com/open?id=1h2xuCZXy4gev1KmsRjmBoDcSJYa5bJ4Q)
[Model output](https://drive.google.com/open?id=1yfJGKjzCi4LJyKs9u48DmIdlVdxnngbb)