Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/alibaba/EasyTransfer
EasyTransfer is designed to make the development of transfer learning in NLP applications easier.
https://github.com/alibaba/EasyTransfer
bert knowledge-distillation nlp-applications transfer-learning
Last synced: about 1 month ago
JSON representation
EasyTransfer is designed to make the development of transfer learning in NLP applications easier.
- Host: GitHub
- URL: https://github.com/alibaba/EasyTransfer
- Owner: alibaba
- License: apache-2.0
- Created: 2020-09-02T03:04:42.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2022-08-25T06:48:53.000Z (over 2 years ago)
- Last Synced: 2024-05-21T04:57:10.497Z (7 months ago)
- Topics: bert, knowledge-distillation, nlp-applications, transfer-learning
- Language: Python
- Homepage: https://www.yuque.com/easytransfer/cn/
- Size: 3.61 MB
- Stars: 852
- Watchers: 25
- Forks: 161
- Open Issues: 24
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- StarryDivineSky - alibaba/EasyTransfer
README
# EasyTransfer: A Simple and Scalable Deep Transfer Learning Platform for NLP Applications
# Intro
The literature has witnessed the success of applying deep Transfer Learning (TL) for many real-world NLP applications, yet it is not easy to build an easy-to-use TL toolkit to achieve such a goal. To bridge this gap, EasyTransfer is designed to facilitate users leveraging deep TL for NLP applications at ease. It was developed in Alibaba in early 2017, and has been used in the major BUs in Alibaba group and achieved very good results in 20+ business scenarios. It supports the mainstream pre-trained ModelZoo, including pre-trained language models (PLMs) and multi-modal models on the [PAI](https://www.aliyun.com/product/bigdata/product/learn) platform, integrates the SOTA models for the mainstream NLP applications in AppZoo, and supports knowledge distillation for PLMs. EasyTransfer is very convenient for users to quickly start model training, evaluation, offline prediction, and online deployment. It also provides rich APIs to make the development of NLP and transfer learning easier.# Main Features
- **Language model pre-training tool:** it supports a comprehensive pre-training tool for users to pre-train language models such as T5 and BERT. Based on the tool, the user can easily train a model to achieve great results in the benchmark leaderboards such as CLUE, GLUE, and SuperGLUE;
- **ModelZoo with rich and high-quality pre-trained models:** supports the Continual Pre-training and Fine-tuning of mainstream LM models such as BERT, ALBERT, RoBERTa, T5, etc. It also supports a multi-modal model FashionBERT developed using the fashion domain data in Alibaba;
- **AppZoo with rich and easy-to-use applications:** supports mainstream NLP applications and those models developed inside of Alibaba, e.g.: HCNN for text matching, and BERT-HAE for MRC.
- **Automatic knowledge distillation:** supports task-adaptive knowledge distillation to distill knowledge from a teacher model to a small task-specific student model to reduce parameter size while keep comparable performance.
- **Easy-to-use and high-performance distributed strategy:** based on the in-house [PAI](https://www.aliyun.com/product/bigdata/product/learn) features, it provides easy-to-use and high-performance distributed strategy for multiple CPU/GPU training.# Architecture
![image.png](https://cdn.nlark.com/yuque/0/2020/png/2480469/1600310258839-04837b68-ef37-449d-8ff4-02dbd8dcef9e.png#align=left&display=inline&height=357&margin=%5Bobject%20Object%5D&name=image.png&originHeight=713&originWidth=1492&size=182794&status=done&style=none&width=746)# Installation
You can either install from pip
```bash
$ pip install easytransfer
```or setup from the source:
```bash
$ git clone https://github.com/alibaba/EasyTransfer.git
$ cd EasyTransfer
$ python setup.py install
```
This repo is tested on Python3.6/2.7, tensorflow 1.12.3# Quick Start
Now let's show how to use just 30 lines of code to build a text classification model based on BERT.```python
from easytransfer import base_model, layers, model_zoo, preprocessors
from easytransfer.datasets import CSVReader, CSVWriter
from easytransfer.losses import softmax_cross_entropy
from easytransfer.evaluators import classification_eval_metricsclass TextClassification(base_model):
def __init__(self, **kwargs):
super(TextClassification, self).__init__(**kwargs)
self.pretrained_model_name = "google-bert-base-en"
self.num_labels = 2
def build_logits(self, features, mode=None):
preprocessor = preprocessors.get_preprocessor(self.pretrained_model_name)
model = model_zoo.get_pretrained_model(self.pretrained_model_name)
dense = layers.Dense(self.num_labels)
input_ids, input_mask, segment_ids, label_ids = preprocessor(features)
_, pooled_output = model([input_ids, input_mask, segment_ids], mode=mode)
return dense(pooled_output), label_idsdef build_loss(self, logits, labels):
return softmax_cross_entropy(labels, self.num_labels, logits)
def build_eval_metrics(self, logits, labels):
return classification_eval_metrics(logits, labels, self.num_labels)
app = TextClassification()
train_reader = CSVReader(input_glob=app.train_input_fp, is_training=True, batch_size=app.train_batch_size)
eval_reader = CSVReader(input_glob=app.eval_input_fp, is_training=False, batch_size=app.eval_batch_size)
app.run_train_and_evaluate(train_reader=train_reader, eval_reader=eval_reader)
```
You can find more details or play with the code in our Jupyter/Notebook [PAI-DSW](https://dsw-dev.data.aliyun.com/#/?fileUrl=https://raw.githubusercontent.com/alibaba/EasyTransfer/master/examples/easytransfer-quick_start.ipynb&fileName=easytransfer-quick_start.ipynb).You can also use AppZoo Command Line Tools to quickly train an App model. Take text classification on SST-2 dataset as an example. First you can download the [train.tsv](http://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/tutorial/glue/SST-2/train.tsv), [dev.tsv](http://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/tutorial/glue/SST-2/dev.tsv) and [test.tsv](http://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/tutorial/glue/SST-2/test.tsv), then start training:
```bash
$ easy_transfer_app --mode train \
--inputTable=./train.tsv,./dev.tsv \
--inputSchema=content:str:1,label:str:1 \
--firstSequence=content \
--sequenceLength=128 \
--labelName=label \
--labelEnumerateValues=0,1 \
--checkpointDir=./sst2_models/\
--numEpochs=3 \
--batchSize=32 \
--optimizerType=adam \
--learningRate=2e-5 \
--modelName=text_classify_bert \
--advancedParameters='pretrain_model_name_or_path=google-bert-base-en'
```And then predict:
```bash
$ easy_transfer_app --mode predict \
--inputTable=./test.tsv \
--outputTable=./test.pred.tsv \
--inputSchema=id:str:1,content:str:1 \
--firstSequence=content \
--appendCols=content \
--outputSchema=predictions,probabilities,logits \
--checkpointPath=./sst2_models/
```
To learn more about the usage of AppZoo, please refer to our [documentation](https://www.yuque.com/easytransfer/itfpm9/ky6hky).# Tutorials
- [PAI-ModelZoo (20+ pretrained models)](https://www.yuque.com/easytransfer/itfpm9/geiy58)
- [FashionBERT-cross-modality pretrained model](https://www.yuque.com/easytransfer/itfpm9/nm3mxu)
- [Knowledge Distillation including vanilla KD, Probes KD, AdaBERT](https://www.yuque.com/easytransfer/itfpm9/kp1dtx)
- [BERT Feature Extraction](https://www.yuque.com/easytransfer/itfpm9/blz7k6)
- [Text Matching including BERT, BERT Two Tower, DAM, HCNN](https://www.yuque.com/easytransfer/itfpm9/xfe19v)
- [Text Classification including BERT, TextCNN](https://www.yuque.com/easytransfer/itfpm9/rypc5x)
- [Machine Reading Comprehesion including BERT, BERT-HAE](https://www.yuque.com/easytransfer/itfpm9/qrvqco)
- [Sequence Labeling including BERT](https://www.yuque.com/easytransfer/itfpm9/we0go2)
- [Meta Fine-tuning for Cross-domain Text Classification](https://www.yuque.com/easytransfer/cn/mgy5gb)# [EasyNLP for CLUE Benchmark](https://github.com/alibaba/EasyNLP/tree/master/benchmarks/clue)
Here is the [CLUE benchmark example](https://github.com/alibaba/EasyNLP/tree/master/benchmarks/clue)
You can find more benchmarks in [https://www.yuque.com/easytransfer/cn/rkm4p7](https://www.yuque.com/easytransfer/itfpm9/rkm4p7)
# Links
Tutorials:[https://www.yuque.com/easytransfer/itfpm9/qtzvuc](https://www.yuque.com/easytransfer/itfpm9/qtzvuc)
ModelZoo:[https://www.yuque.com/easytransfer/itfpm9/oszcof](https://www.yuque.com/easytransfer/itfpm9/oszcof)
AppZoo:[https://www.yuque.com/easytransfer/itfpm9/ky6hky](https://www.yuque.com/easytransfer/itfpm9/ky6hky)
API docs:[http://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/eztransfer_docs/html/index.html](http://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/eztransfer_docs/html/index.html)
# Contact Us
Scan the following QR codes to join Dingtalk discussion group. The group discussions are most in Chinese, but English is also welcomed.Also we can scan the following QR code to join wechat discussion group.
# Citation
```text
@article{easytransfer,
author = {Minghui Qiu and
Peng Li and
Chengyu Wang and
Haojie Pan and
An Wang and
Cen Chen and
Xianyan Jia and
Yaliang Li and
Jun Huang and
Deng Cai and
Wei Lin},
title = {EasyTransfer - A Simple and Scalable Deep Transfer Learning Platform for NLP Applications
},
journal = {CIKM 2021},
url = {https://arxiv.org/abs/2011.09463},
year = {2021}
}
```