https://github.com/CyberZHG/keras-xlnet

Implementation of XLNet that can load pretrained checkpoints
https://github.com/CyberZHG/keras-xlnet

glue keras language-model nlp xlnet

Last synced: 5 months ago
JSON representation

Implementation of XLNet that can load pretrained checkpoints

Host: GitHub
URL: https://github.com/CyberZHG/keras-xlnet
Owner: CyberZHG
License: mit
Archived: true
Created: 2019-06-20T10:30:13.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2022-01-22T11:15:20.000Z (over 3 years ago)
Last Synced: 2025-04-19T19:55:04.939Z (6 months ago)
Topics: glue, keras, language-model, nlp, xlnet
Language: Python
Homepage: https://pypi.org/project/keras-xlnet/
Size: 1000 KB
Stars: 171
Watchers: 5
Forks: 26
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

StarryDivineSky - CyberZHG/keras-xlnet

README

          # Keras XLNet

[![Version](https://img.shields.io/pypi/v/keras-xlnet.svg)](https://pypi.org/project/keras-xlnet/)

[License](https://img.shields.io/pypi/l/keras-xlnet.svg)

\[[中文](https://github.com/CyberZHG/keras-xlnet/blob/master/README.zh-CN.md)|[English](https://github.com/CyberZHG/keras-xlnet/blob/master/README.md)\]

Unofficial implementation of [XLNet](https://arxiv.org/pdf/1906.08237). [Embedding extraction](demo/extract/token_embeddings.py) and [embedding extract with memory](demo/extract/token_embeddings_with_memory.py) show how to get the outputs of the last transformer layer using pre-trained checkpoints.

## Install

```bash

pip install keras-xlnet

```

## Usage

### Fine-tuning on GLUE

Click the task name to see the demos with base model:

|Task Name                       |Metrics                       |Approximate Results on Dev Set|

|:-------------------------------|:----------------------------:|----:|

|[CoLA](demo/GLUE/CoLA/cola.py)  |Matthew Corr.                 |52   |

|[SST-2](demo/GLUE/SST-2/sst2.py)|Accuracy                      |93   |

|[MRPC](demo/GLUE/MRPC/mrpc.py)  |Accuracy/F1                   |86/89|

|[STS-B](demo/GLUE/STS-B/stsb.py)|Pearson Corr. / Spearman Corr.|86/87|

|[QQP](demo/GLUE/QQP/qqp.py)     |Accuracy/F1                   |90/86|

|[MNLI](demo/GLUE/MNLI/mnli.py)  |Accuracy                      |84/84|

|[QNLI](demo/GLUE/QNLI/qnli.py)  |Accuracy                      |86   |

|[RTE](demo/GLUE/RTE/rte.py)     |Accuracy                      |64   |

|[WNLI](demo/GLUE/WNLI/wnli.py)  |Accuracy                      |56   |

(Only 0s are predicted in WNLI dataset)

### Load Pretrained Checkpoints

```python

import os

from keras_xlnet import Tokenizer, load_trained_model_from_checkpoint, ATTENTION_TYPE_BI

checkpoint_path = '.../xlnet_cased_L-24_H-1024_A-16'

tokenizer = Tokenizer(os.path.join(checkpoint_path, 'spiece.model'))

model = load_trained_model_from_checkpoint(

    config_path=os.path.join(checkpoint_path, 'xlnet_config.json'),

    checkpoint_path=os.path.join(checkpoint_path, 'xlnet_model.ckpt'),

    batch_size=16,

    memory_len=512,

    target_len=128,

    in_train_phase=False,

    attention_type=ATTENTION_TYPE_BI,

)

model.summary()

```

Arguments `batch_size`, `memory_len` and `target_len` are maximum sizes used for initialization of memories. The model used for training a language model is returned if `in_train_phase` is `True`, otherwise a model used for fine-tuning will be returned.

### About I/O

**Note that** `shuffle` should be `False` in either `fit` or `fit_generator` if memories are used. 

#### `in_train_phase` is `False`

3 inputs:

* IDs of tokens, with shape `(batch_size, target_len)`.

* IDs of segments, with shape `(batch_size, target_len)`.

* Length of memories, with shape `(batch_size, 1)`.

1 output:

* The feature for each token, with shape `(batch_size, target_len, units)`.

#### `in_train_phase` is `True`

4 inputs:

* IDs of tokens, with shape `(batch_size, target_len)`.

* IDs of segments, with shape `(batch_size, target_len)`.

* Length of memories, with shape `(batch_size, 1)`.

* Masks of tokens, with shape `(batch_size, target_len)`.

1 output:

* The probability of each token in each position, with shape `(batch_size, target_len, num_token)`.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/CyberZHG/keras-xlnet

Awesome Lists containing this project

README