Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mikeizbicki/wiktionary_bli
https://github.com/mikeizbicki/wiktionary_bli
Last synced: 6 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/mikeizbicki/wiktionary_bli
- Owner: mikeizbicki
- Created: 2022-09-11T07:01:14.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2022-10-16T06:23:10.000Z (about 2 years ago)
- Last Synced: 2024-11-08T22:36:30.934Z (2 months ago)
- Language: TeX
- Size: 70.7 MB
- Stars: 4
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Wiktionary_bli
This is a collection of training and testing datasets for the Bilingual Lexicon Induction problem.
The dataset and motivation are described in the paper [Aligning Word Vectors on Low-Resource Languages with Wiktionary](paper/paper.pdf)The `/final` folder contains the datasets for each language.
For example, for the Korean language datasets are located at| final name | purpose |
| --- | --- |
| `/final/ko-en.all` | the full collection of word/definition pairs extracted from wiktionary |
| `/final/ko-en.train` | the training set |
| `/final/ko-en.test` | the full test set |
| `/final/ko-en.testsmall` | the small test set|