An open API service indexing awesome lists of open source software.

https://github.com/kumar-shridhar/longtonotes

LongtoNotes: OntoNotes with Longer Coreference Chains
https://github.com/kumar-shridhar/longtonotes

Last synced: about 2 months ago
JSON representation

LongtoNotes: OntoNotes with Longer Coreference Chains

Awesome Lists containing this project

README

        

# LongtoNotes
### [LongtoNotes: OntoNotes with Longer Coreference Chains](https://arxiv.org/abs/2210.03650)

**NOTE: Please fill out [this Google form](https://docs.google.com/forms/d/e/1FAIpQLScoWkBOgJ1HH_phtvTJ4_hGvQw6f0W6K7kw74sUKCDTG8P2iA/viewform) for getting access to LongtoNotes dataset.**

[Ontonotes](https://catalog.ldc.upenn.edu/LDC2013T19) has served as the most important benchmark for coreference resolution. However, for ease of annotation, several long documents in Ontonotes were split into smaller parts.
In this work, we build a corpus of coreference-annotated documents of significantly longer length than what is currently available.
We do so by providing an accurate, manually-curated, merging of annotations from documents that were split into multiple parts in the original Ontonotes annotation process Ontonotes.
The resulting corpus, which we call **LongtoNotes** contains documents in multiple genres of the English language with varying lengths, the longest of which are up to 8x the length of documents in Ontonotes, and 2x those in [Litbank](https://github.com/dbamman/litbank).

![Genre wise comparison between OntoNotes and LongtoNotes dataset](Images/genre_comparison.png)

---

## Comparison to other coref datasets

![Comparison between LongtoNotes and other coref dataset](Images/coref_comparison.png)

---

## Model performance on OntoNotes vs LongtoNotes

![Model performance of various models on OntoNotes vs LongtoNotes](Images/model_performance_comparison.png)

---

Creative Commons License
LongtoNotes is licensed under a Creative Commons Attribution 4.0 International License. However, access to OntoNotes is needed in order to create LongtoNotes from it.

**For access to LongtoNotes now, please reach out to us** ([email protected]). We will soon release the automatic script to convert OntoNotes into LongtoNotes.

---

```bibtex
@article{shridhar2022longtonotes,
title={Longtonotes: OntoNotes with Longer Coreference Chains},
author={Shridhar, Kumar and Monath, Nicholas and Thirukovalluru, Raghuveer and Stolfo, Alessandro and Zaheer, Manzil and McCallum, Andrew and Sachan, Mrinmaya},
journal={arXiv preprint arXiv:2210.03650},
year={2022}
}
```