https://github.com/kumar-shridhar/longtonotes
LongtoNotes: OntoNotes with Longer Coreference Chains
https://github.com/kumar-shridhar/longtonotes
Last synced: about 2 months ago
JSON representation
LongtoNotes: OntoNotes with Longer Coreference Chains
- Host: GitHub
- URL: https://github.com/kumar-shridhar/longtonotes
- Owner: kumar-shridhar
- Created: 2022-01-29T18:15:11.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-10-12T21:00:24.000Z (over 2 years ago)
- Last Synced: 2025-01-28T17:17:13.981Z (4 months ago)
- Size: 241 KB
- Stars: 8
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# LongtoNotes
### [LongtoNotes: OntoNotes with Longer Coreference Chains](https://arxiv.org/abs/2210.03650)**NOTE: Please fill out [this Google form](https://docs.google.com/forms/d/e/1FAIpQLScoWkBOgJ1HH_phtvTJ4_hGvQw6f0W6K7kw74sUKCDTG8P2iA/viewform) for getting access to LongtoNotes dataset.**
[Ontonotes](https://catalog.ldc.upenn.edu/LDC2013T19) has served as the most important benchmark for coreference resolution. However, for ease of annotation, several long documents in Ontonotes were split into smaller parts.
In this work, we build a corpus of coreference-annotated documents of significantly longer length than what is currently available.
We do so by providing an accurate, manually-curated, merging of annotations from documents that were split into multiple parts in the original Ontonotes annotation process Ontonotes.
The resulting corpus, which we call **LongtoNotes** contains documents in multiple genres of the English language with varying lengths, the longest of which are up to 8x the length of documents in Ontonotes, and 2x those in [Litbank](https://github.com/dbamman/litbank).
---
## Comparison to other coref datasets

---
## Model performance on OntoNotes vs LongtoNotes

---
LongtoNotes is licensed under a Creative Commons Attribution 4.0 International License. However, access to OntoNotes is needed in order to create LongtoNotes from it.**For access to LongtoNotes now, please reach out to us** ([email protected]). We will soon release the automatic script to convert OntoNotes into LongtoNotes.
---
```bibtex
@article{shridhar2022longtonotes,
title={Longtonotes: OntoNotes with Longer Coreference Chains},
author={Shridhar, Kumar and Monath, Nicholas and Thirukovalluru, Raghuveer and Stolfo, Alessandro and Zaheer, Manzil and McCallum, Andrew and Sachan, Mrinmaya},
journal={arXiv preprint arXiv:2210.03650},
year={2022}
}
```