Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/oussamaahmia/TED-dataset
https://github.com/oussamaahmia/TED-dataset
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/oussamaahmia/TED-dataset
- Owner: oussamaahmia
- License: mit
- Created: 2017-09-28T22:45:23.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2024-02-20T14:02:06.000Z (10 months ago)
- Last Synced: 2024-07-15T13:54:47.509Z (5 months ago)
- Size: 10.6 MB
- Stars: 6
- Watchers: 6
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# TED-dataset
The two sub-datasets, fd-TED and par-TED, will be updated in a regular basis to keep tracks of the new calls for
tender published by the EU states.- The [par-TED](https://drive.google.com/drive/folders/1U2W-dKc7jJBtpt1iuLqDZgNQeM8wA7ds) is a multilingual (24 languages) aligned corpus in the form of a set of parallel unique sentences translated to at least 23 languages.
- The [fd-TED](https://drive.google.com/drive/folders/1G-21p8vxvbXtb6hoQPjbvMnokThyk8HI) corpus is built from the full content of the documents extracted from the [TED − Tenders Electronic Daily platform](https://ted.europa.eu). This dataset can be used as a benchmark for supervised classification or for training machine learning models applied to business intelligence application.
We also propose a filtered version of fd-ted created by ignoring administrative information.[comment]: <> (***NB: The currently published dataset, contains only filtered documents. The raw version will be soon available***)
For further information please refer to this [article](http://www.lrec-conf.org/proceedings/lrec2018/pdf/832.pdf).
**Citation:**
\
``@inproceedings{ahmia-etal-2018-two,
title = "Two Multilingual Corpora Extracted from the Tenders Electronic Daily for Machine Learning and Machine Translation Applications.",
author = "Ahmia, Oussama and
B{\'e}chet, Nicolas and
Marteau, Pierre-Fran{\c{c}}ois",
booktitle = "Proceedings of the Eleventh International Conference on Language Resources and Evaluation ({LREC} 2018)",
month = may,
year = "2018",
address = "Miyazaki, Japan",
publisher = "European Language Resources Association (ELRA)",
url = "https://www.aclweb.org/anthology/L18-1583",
}``