Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/thunlp/DocRED
Dataset and codes for ACL 2019 DocRED: A Large-Scale Document-Level Relation Extraction Dataset.
https://github.com/thunlp/DocRED
Last synced: 3 months ago
JSON representation
Dataset and codes for ACL 2019 DocRED: A Large-Scale Document-Level Relation Extraction Dataset.
- Host: GitHub
- URL: https://github.com/thunlp/DocRED
- Owner: thunlp
- License: mit
- Created: 2019-06-03T10:43:17.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2020-12-01T03:18:37.000Z (almost 4 years ago)
- Last Synced: 2024-06-24T05:45:28.841Z (5 months ago)
- Language: Python
- Homepage:
- Size: 55.7 KB
- Stars: 608
- Watchers: 18
- Forks: 111
- Open Issues: 11
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- StarryDivineSky - thunlp/DocRED
README
# DocRED
Dataset and code for baselines for [DocRED: A Large-Scale Document-Level Relation Extraction Dataset](https://arxiv.org/abs/1906.06127v3)Multiple entities in a document generally exhibit complex inter-sentence relations, and cannot be well handled by existing relation extraction (RE) methods that typically focus on extracting intra-sentence relations for single entity pairs. In order to accelerate the research on document-level RE, we introduce DocRED, a new dataset constructed from Wikipedia and Wikidata with three features:
+ DocRED annotates both named entities and relations, and is the largest human-annotated dataset for document-level RE from plain text.
+ DocRED requires reading multiple sentences in a document to extract entities and infer their relations by synthesizing all information of the document.
+ Along with the human-annotated data, we also offer large-scale distantly supervised data, which enables DocRED to be adopted for both supervised and weakly supervised scenarios.## Codalab
If you are interested in our dataset, you are welcome to join in the Codalab competition at [DocRED](https://competitions.codalab.org/competitions/20717)## Cite
If you use the dataset or the code, please cite this paper:
```
@inproceedings{yao2019DocRED,
title={{DocRED}: A Large-Scale Document-Level Relation Extraction Dataset},
author={Yao, Yuan and Ye, Deming and Li, Peng and Han, Xu and Lin, Yankai and Liu, Zhenghao and Liu, Zhiyuan and Huang, Lixin and Zhou, Jie and Sun, Maosong},
booktitle={Proceedings of ACL 2019},
year={2019}
}
```