https://github.com/nghuyong/ERNIE-Pytorch
ERNIE Pytorch Version
https://github.com/nghuyong/ERNIE-Pytorch
Last synced: 20 days ago
JSON representation
ERNIE Pytorch Version
- Host: GitHub
- URL: https://github.com/nghuyong/ERNIE-Pytorch
- Owner: nghuyong
- License: mit
- Created: 2019-05-13T06:25:20.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2023-07-26T22:31:18.000Z (over 1 year ago)
- Last Synced: 2025-03-28T00:06:35.723Z (25 days ago)
- Language: Python
- Homepage:
- Size: 58.6 KB
- Stars: 926
- Watchers: 13
- Forks: 120
- Open Issues: 7
-
Metadata Files:
- Readme: readme.md
- License: LICENSE
Awesome Lists containing this project
- awesome-bert - nghuyong/ERNIE-Pytorch
README
ERNIE-Pytorch
This project is to convert ERNIE from paddlepaddle to huggingface's format (in Pytorch).
**News: ERNIE has been merged
into [huggingface/transformers@v4.22.0](https://github.com/huggingface/transformers/releases/tag/v4.22.0) !!**## Get Started
```
pip install --upgrade transformers
```Take `ernie-1.0-base-zh` as an example:
```Python
from transformers import BertTokenizer, ErnieModeltokenizer = BertTokenizer.from_pretrained("nghuyong/ernie-1.0-base-zh")
model = ErnieModel.from_pretrained("nghuyong/ernie-1.0-base-zh")
```### Supported Models
| Model Name | Language | Description |
|:-------------------:|:--------:|:-------------------------------:|
| ernie-1.0-base-zh | Chinese | Layer:12, Heads:12, Hidden:768 |
| ernie-2.0-base-en | English | Layer:12, Heads:12, Hidden:768 |
| ernie-2.0-large-en | English | Layer:24, Heads:16, Hidden:1024 |
| ernie-3.0-xbase-zh | Chinese | Layer:20, Heads:16, Hidden:1024 |
| ernie-3.0-base-zh | Chinese | Layer:12, Heads:12, Hidden:768 |
| ernie-3.0-medium-zh | Chinese | Layer:6, Heads:12, Hidden:768 |
| ernie-3.0-mini-zh | Chinese | Layer:6, Heads:12, Hidden:384 |
| ernie-3.0-micro-zh | Chinese | Layer:4, Heads:12, Hidden:384 |
| ernie-3.0-nano-zh | Chinese | Layer:4, Heads:12, Hidden:312 |
| ernie-health-zh | Chinese | Layer:12, Heads:12, Hidden:768 |
| ernie-gram-zh | Chinese | Layer:12, Heads:12, Hidden:768 |You can find all the supported models from huggingface's model
hub: [huggingface.co/nghuyong](https://huggingface.co/nghuyong),
and model details from paddle's official
repo: [PaddleNLP](https://paddlenlp.readthedocs.io/zh/latest/model_zoo/transformers/ERNIE/contents.html)
and [ERNIE](https://github.com/PaddlePaddle/ERNIE/blob/repro).## Details
I want to convert the model from paddle version by myself 😉
The following will take `ernie-1.0-base-zh` as an example to show how to convert.
1. Download the paddle-paddle version ERNIE model. Execute the following code
```
import paddlenlp
tokenizer = paddlenlp.transformers.ErnieTokenizer.from_pretrained("ernie-1.0-base-zh")
model = paddlenlp.transformers.ErnieForMaskedLM.from_pretrained("ernie-1.0-base-zh")
```
And then you will get the model in `~/.paddlenlp/models/ernie-1.0-base-zh/`, move to this project path.
2. ```pip install -r requirements.txt```
3. ```python convert.py```
4. Now, a folder named `convert` will be in the project path, and there will be three files in this
folder: `config.json`,`pytorch_model.bin` and `vocab.txt`.I want to check the calculation results before and after model conversion 😁
```bash
python test.py --task logit_check
```You will get the output:
```output
huggingface result
pool output: [-1. -1. 0.9981035 -0.9996652 -0.78173476 -1. -0.9994901 0.97012603 0.85954666 0.9854131 ]paddle result
pool output: [-0.99999976 -0.99999976 0.9981028 -0.9996651 -0.7815545 -0.99999976 -0.9994898 0.97014064 0.8594844 0.985419 ]
```It can be seen that the result of our convert version is the same with the official paddlepaddle's version.
I want to reproduce the cloze test in ERNIE1.0's paper 😆
```bash
python test.py --task cloze_check
```You will get the output:
```bash
huggingface result
prediction shape: torch.Size([47, 18000])
predict result: ['西', '游', '记', '是', '中', '国', '神', '魔', '小', '说', '的', '经', '典', '之', '作', ',', '与', '《', '三', '国', '演', '义', '》', '《', '水', '浒', '传', '》', '《', '红', '楼', '梦', '》', '并', '称', '为', '中', '国', '古', '典', '四', '大', '名', '著', '。']
[CLS] logit: [-15.693626 -19.522263 -10.429456 ... -11.800728 -12.253127 -14.375117]paddle result
prediction shape: [47, 18000]
predict result: ['西', '游', '记', '是', '中', '国', '神', '魔', '小', '说', '的', '经', '典', '之', '作', ',', '与', '《', '三', '国', '演', '义', '》', '《', '水', '浒', '传', '》', '《', '红', '楼', '梦', '》', '并', '称', '为', '中', '国', '古', '典', '四', '大', '名', '著', '。']
[CLS] logit: [-15.693538 -19.521954 -10.429307 ... -11.800765 -12.253114 -14.375412]
```## Citation
If you use this work in a scientific publication, I would appreciate that you can also cite the following BibTex entry:
```latex
@misc{nghuyong2019@ERNIE-Pytorch,
title={ERNIEPytorch},
author={Yong Hu},
howpublished={\url{https://github.com/nghuyong/ERNIE-Pytorch}},
year={2019}
}
```