Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/thunlp/COS960
COS960: A Chinese Word Similarity Dataset of 960 Word Pairs
https://github.com/thunlp/COS960
Last synced: 6 days ago
JSON representation
COS960: A Chinese Word Similarity Dataset of 960 Word Pairs
- Host: GitHub
- URL: https://github.com/thunlp/COS960
- Owner: thunlp
- License: mit
- Created: 2019-06-01T07:15:23.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2019-06-06T11:20:21.000Z (over 5 years ago)
- Last Synced: 2024-08-02T05:05:49.495Z (4 months ago)
- Language: Python
- Size: 20.5 KB
- Stars: 36
- Watchers: 9
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-hackchinese - C0S960
README
# COS960
COS960 is a Chinese word similarity dataset of 960 word pairs. Each pair of words is annotated by 15 native speakers with a similarity score which reflects **true similarity**. The 960 word pairs are further divided into 3 groups according to their Part Of Speech tags, including 480 pairs of nouns, 240 pairs of verbs and 240 pairs of adjectives.### Usage
To use COS960 to test your word embedding, use command
```
python correlation_calcu.py {VECTOR_FILE}
```### Dataset
The data in the files is formulated as
```
[Word1] [Word2] [Average] [Annotator1] ... [Annotator15]小心谨慎 谨慎小心 4.0 4 ... 4
```### Cite
If you use the dataset, please cite this:
```
@article{huang2019COS960,
Author = {Junjie Huang and Fanchao Qi and Chenghao Yang and Zhiyuan Liu and Maosong Sun},
Title = {{COS960: A Chinese Word Similarity Dataset of 960 Word Pairs}},
journal={arXiv preprint arXiv:1906.00247},
Year = {2019},
}
```