Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/shuxinyin/SimCSE-Pytorch
中文数据集下SimCSE+ESimCSE的实现
https://github.com/shuxinyin/SimCSE-Pytorch
Last synced: 3 months ago
JSON representation
中文数据集下SimCSE+ESimCSE的实现
- Host: GitHub
- URL: https://github.com/shuxinyin/SimCSE-Pytorch
- Owner: shuxinyin
- License: mit
- Created: 2021-11-10T14:49:10.000Z (almost 3 years ago)
- Default Branch: master
- Last Pushed: 2022-05-21T14:52:14.000Z (over 2 years ago)
- Last Synced: 2024-06-24T05:36:59.015Z (5 months ago)
- Language: Python
- Size: 574 KB
- Stars: 185
- Watchers: 2
- Forks: 31
- Open Issues: 12
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- StarryDivineSky - shuxinyin/SimCSE-Pytorch
README
## SimCSE Inplemention
SimCSE在中文上无监督 + 有监督 pytorch版
> SimCSE:https://arxiv.org/pdf/2104.08821.pdf
> ESimCSE: https://arxiv.org/pdf/2109.04380.pdf1.database: SNS-B (uploaded)
> directory: data/SNS-B/2.environment
> torch==1.8.2
> transformers==4.12.3
>
> video card: 3060Ti 8G
> Due to the limitation of the graphics card, the batch_size is set very small.
> You can try increasing the batch_size to get better results with video memory allowed.3.how to run?
> SimCSE: python train.py
> ESimCSE: python ESimCSE_train.py4.Result (un-supervised)
**spearman corrcoef** is shown as result below:| Model | un_supervised |
|-----------|---------------|
| Bert_base | 0.538 |
| SimCSE | 0.692 |
| ESimCSE | 0.707 |说明:原论文的无监督SimCSE基于英文,从维基百科上挑了100万个句子进行训练的。本项目评测实验是在中文数据集STS-B(已上传),实现结果以[苏剑林科学空间结果](https://spaces.ac.cn/archives/8348) 对照。
SimCSE结果与其一致。
![img.png](./data/pic/img.png)
**以上供参考,码代码不易,有用请点个赞喔。**