https://github.com/thunlp/COS960

COS960: A Chinese Word Similarity Dataset of 960 Word Pairs
https://github.com/thunlp/COS960

Last synced: 2 months ago
JSON representation

COS960: A Chinese Word Similarity Dataset of 960 Word Pairs

Host: GitHub
URL: https://github.com/thunlp/COS960
Owner: thunlp
License: mit
Created: 2019-06-01T07:15:23.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2019-06-06T11:20:21.000Z (about 6 years ago)
Last Synced: 2025-04-22T11:10:31.402Z (2 months ago)
Language: Python
Size: 20.5 KB
Stars: 35
Watchers: 8
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-hackchinese - C0S960

README

        # COS960

COS960 is a Chinese word similarity dataset of 960 word pairs. Each pair of words is annotated by  15 native speakers with a similarity score which reflects **true similarity**. The 960 word pairs are further divided into 3 groups according to their Part Of Speech tags, including 480 pairs of nouns, 240 pairs of verbs and 240 pairs of adjectives.

### Usage

To use COS960 to test your word embedding, use command

```

python correlation_calcu.py {VECTOR_FILE}

```

### Dataset

The data in the files is formulated as

```

[Word1] [Word2] [Average] [Annotator1] ... [Annotator15]

小心谨慎  谨慎小心     4.0         4      ...       4 

```

### Cite

If you  use the dataset, please cite this:

```

@article{huang2019COS960,

Author = {Junjie Huang and Fanchao Qi and Chenghao Yang and Zhiyuan Liu and Maosong Sun},

Title = {{COS960: A Chinese Word Similarity Dataset of 960 Word Pairs}},

journal={arXiv preprint arXiv:1906.00247},

Year = {2019},

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/thunlp/COS960

Awesome Lists containing this project

README