Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/xv44586/toolkit4nlp
transformers implement (architecture, task example, serving and more)
https://github.com/xv44586/toolkit4nlp
bert keras nlp
Last synced: about 1 month ago
JSON representation
transformers implement (architecture, task example, serving and more)
- Host: GitHub
- URL: https://github.com/xv44586/toolkit4nlp
- Owner: xv44586
- License: apache-2.0
- Created: 2020-06-30T11:09:13.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2022-03-23T10:19:37.000Z (almost 3 years ago)
- Last Synced: 2024-12-08T00:51:09.532Z (about 2 months ago)
- Topics: bert, keras, nlp
- Language: Python
- Homepage:
- Size: 592 KB
- Stars: 97
- Watchers: 8
- Forks: 18
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# toolkits for NLP
## intent
为了方便自己学习与理解一些东西,实现一些自己的想法## Update info:
- 2021.8.5 增加serving 相关代码,线上deploy 时可以参考,对应代码:serving
- 2021.8.4 增加混合精度训练实验,实验代码 : classification_tnews_mixed_precision
- 2021.5.13 增加GPT2 及对应的对话生成实验,实验代码:basic_language_model_gpt2_gen
- 2021.5.12 增加GPT 及对应的对话生成实验,实验代码: basic_language_model_gpt_gen
- 2021.5.1 增加ReZero 及对应的文本分类实验,实验代码: tnews_rezero_pretrain_finetuning
- 2021.3.26 增加RealFormer(residual attention)及对应的文本分类实验,实验代码: classification tnews pet realformer
- 2021.1.13 增加SBERT的复现demo,具体代码:sbert-stsb
- 2020.11.26 增加pretrain + fine-tuning example, 具体代码:classification tnew pretrain before fine-tuning
- 2020.11.10 NEZHA增加external_embedding_weights, 可以通过该参数将其他信息融合进NEZHA Token-Embedding,具体使用方式:
```python
from toolkit4nlp.models import build_transformer_model
# 自己构造 embeddings_matrix,与vocabulary 对应
config_path = ''
checkpoint_path = ''
embeddings_matrix = None
nezha = build_transformer_model(
config_path=checkpoint_path,
checkpoint_path=checkpoint_path,
model='nezha', external_embedding_size=100,
external_embedding_weights=embeddings_matrix)
```
- 2020.11.3 增加ccf 2020 qa match baseline:ccf_2020_qa_match_pair和ccf_2020_qa_match_point
- 2020.10.19 AdaBelief Optimizer 及对应example,具体代码:classification use AdaBelief
- 2020.10.16 增加focal loss 及对应example,具体代码:classification_focal_loss
- 2020.09.27 增加NEZHA的实现,使用方法:
```python
from toolkit4nlp.models import build_transformer_model
config_path = '/home/mingming.xu/pretrain/NLP/chinese_nezha_base/config.json'
checkpoint_path = '/home/mingming.xu/pretrain/NLP/chinese_nezha_base/model_base.ckpt'model = build_transformer_model(config_path=config_path, checkpoint_path=checkpoint_path, model='nezha')
```
- 2020.09.22 增加FastBERT的实现,具体代码:classification ifytek with FastBERT
- 2020.09.15 增加两个尝试在分类任务上构造新的任务来增强性能实验,具体代码:classification ifytek with similarity 和 classification ifytek with seq2seq
- 2020.09.10 增加Knowledge Distillation Bert example, 具体代码: distilling knowledge bert
- 2020.08.24 增加UniLM做question answer generation example,具体代码:qa question answer generation
- 2020.08.20 增加UniLM做question generation example,具体代码:qa question generation
- 2020.08.20 增加UniLM和LM model,使用方法:
```python
from toolkit4nlp.models import build_transformer_model
config_path = '/home/mingming.xu/pretrain/NLP/chinese_electra_base_L-12_H-768_A-12/config.json'
checkpoint_path = '/home/mingming.xu/pretrain/NLP/chinese_electra_base_L-12_H-768_A-12/electra_base.ckpt'# lm
model = build_transformer_model(
config_path=config_path,
checkpoint_path=checkpoint_path,
application='lm'
)# unilm
model = build_transformer_model(
config_path=config_path,
checkpoint_path=checkpoint_path,
application='unilm'
)```
- 2020.08.19 增加ELECTRA model,使用方法:
```python
from toolkit4nlp.models import build_transformer_modelconfig_path = '/home/mingming.xu/pretrain/NLP/chinese_electra_base_L-12_H-768_A-12/config.json'
checkpoint_path = '/home/mingming.xu/pretrain/NLP/chinese_electra_base_L-12_H-768_A-12/electra_base.ckpt'model = build_transformer_model(
config_path=config_path,
checkpoint_path=checkpoint_path,
model='electra',
)```
- 2020.08.17 增加 two-stage-fine-tuning 实验,验证bert-of-theseus中theseus_model的必要性,具体代码: two_stage_fine_tuning
- 2020.08.14 增加 bert-of-theseus在ner相关实验下的代码,具体代码:sequence_labeling_ner_bert_of_theseus
- 2020.08.11 增加 bert-of-theseus在文本分类下的相关实验代码,具体代码:classification_ifytek_bert_of_theseus
- 2020.08.06 增加 cws-crf example,具体代码:cws_crf_example
- 2020.08.05 增加 ner-crf example,具体代码:ner_crf_example
- 2020.08.01 增加 bert + dgcnn 做 qa task, 具体代码:qa_dgcnn_example
- 2020.07.27 增加 pretraining,用法参照 pretraining/README.md
- 2020.07.18 增加 tokenizer, 用法:
```python
from toolkit4nlp.tokenizers import Tokenizer
vocab = ''
tokenizer = Tokenizer(vocab, do_lower_case=True)
tokenizer.encode('我爱你中国')
```
- 2020.07.16 完成bert加载预训练权重,用法:
```python
from toolkit4nlp.models import build_transformer_modelconfig_path = ''
checkpoints_path = ''
model = build_transformer_model(config_path, checkpoints_path)
```
主要参考了bert 和
bert4keras以及
keras_bert