https://github.com/cluebenchmark/electra

中文预训练 ELECTRA 模型: 基于对抗学习 pretrain Chinese Model
https://github.com/cluebenchmark/electra

adversarial-networks albert bert electra gan language-model pretrained-models

Last synced: 7 months ago
JSON representation

中文预训练 ELECTRA 模型: 基于对抗学习 pretrain Chinese Model

Host: GitHub
URL: https://github.com/cluebenchmark/electra
Owner: CLUEbenchmark
Created: 2020-01-15T14:48:25.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2020-03-22T01:01:43.000Z (over 5 years ago)
Last Synced: 2025-01-03T07:48:15.529Z (9 months ago)
Topics: adversarial-networks, albert, bert, electra, gan, language-model, pretrained-models
Homepage: https://openreview.net/forum?id=r1xMH1BtvB
Size: 955 KB
Stars: 140
Watchers: 9
Forks: 11
Open Issues: 4
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # ELECTRA

中文 预训练 ELECTREA 模型: 基于对抗学习 pretrain Chinese Model

code Repost from google official code: https://github.com/google-research/electra

具体使用说明：参考 官方链接

# Electra Chinese tiny模型路径

## google drive

electra-tiny google-drive

## baidu drive

electra-tiny baidu-pan

code:rs99

## 模型说明

1. 与 tinyBERT 的 配置相同

2. generator 为 discriminator的 1/4

# How to use official code

## Steps

1. 修改 configure_pretraining.py 里面的 数据路径、tpu、gpu 配置

2. 修改 model_size：可在 code/util/training_utils.py 里面 自行定义模型大小

3. 数据输入格式：原始的input_ids, input_mask, segment_ids，训练过程中会在线 做 uniform mask sampling（不需要离线 生成 masked input ids）

# Performance

## gen+disc:

electra-tiny

| metric | value | 

| --- | --- | 

| disc_accuracy | 0.95093095 | 

| disc_auc | 0.9762006 |

| disc_loss | 0.14071295 |

| disc_precision | 0.8018275 |

| disc_recall | 0.6088053 |

| loss | 9.516352 |

| masked_lm_accuracy | 0.46732807 |

| masked_lm_loss | 2.8209455 |

| sampled_masked_lm_accuracy | 0.3504382 |

The model are trained on CLUE 10G Chinese Corpus with 1M-steps

## Downstream finetuning on CLUE benchmark:

注：only use pretrained electra-tiny with layer-wise learning rate decay without any distilaltion、data-augmentation. learning rate is set to 1e-4 for each task and run 10-epochs. (According to official results, the results may have large variance)

|     | AFQMC | TNEWS | IFLYTEK | CMNLI  | WSC  | CSL |

| --- | ---   | ---   | ---     | ---    | ---  |---  |

| Metrics | Acc | Acc | Acc | Acc  | Acc  | Acc |

| ELECTRA-tiny | 70.319 | 54.280 | 53.538 |  73.745 | 64.336  | 78.700 | 

| Roberta-tiny | 69.904 | 54.150 | 56.808 |  74.037 | 64.336  | 74.133 |

注：

1. electra 在 多分类问题上面 可能会有 performance 下降

2. gen、disc的规模 配比 比较hacky，与 mask的方法 等相关

报名NLPCC-高性能小模型测评

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cluebenchmark/electra

Awesome Lists containing this project

README