Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/yuanxiaosc/Deep_dynamic_contextualized_word_representation

TensorFlow code and pre-trained models for A Dynamic Word Representation Model Based on Deep Context. It combines the idea of BERT model and ELMo's deep context word representation.
https://github.com/yuanxiaosc/Deep_dynamic_contextualized_word_representation

bert elmo nlp transformer

Last synced: 30 days ago
JSON representation

TensorFlow code and pre-trained models for A Dynamic Word Representation Model Based on Deep Context. It combines the idea of BERT model and ELMo's deep context word representation.

Host: GitHub
URL: https://github.com/yuanxiaosc/Deep_dynamic_contextualized_word_representation
Owner: yuanxiaosc
Created: 2018-11-13T11:37:46.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2018-12-27T13:08:35.000Z (over 5 years ago)
Last Synced: 2024-03-23T19:05:03.834Z (3 months ago)
Topics: bert, elmo, nlp, transformer
Language: Python
Homepage: https://yuanxiaosc.github.io/2018/11/27/Bidirectional_Encoder_Representations_Transformers/
Size: 72.3 KB
Stars: 16
Watchers: 3
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.md

Lists

awesome-bert - yuanxiaosc/Deep_dynamic_word_representation - trained models for deep dynamic word representation (DDWR). It combines the BERT model and ELMo's deep context word representation., (BERT language model and embedding:)

README

        # Deep dynamic Contextualized word representation (DDCWR)

TensorFlow code and pre-trained models for DDCWR

# Important explanation

1. The method of the model is simple, only using the feed forward neural network with attention mechanism.

2. Model training is fast, and only a few cycles can be used to train the model. The value of the initialization parameter comes from the BERT model of Google.

3. The effect of the model is very good. In most cases, it is consistent with the current (2018-11-13) optimal model. Sometimes the effect is better. The optimal effect can be seen in [gluebenchmark](https://gluebenchmark.com/leaderboard).

# Thought of article

This model Deep_dynamic_word_representation(DDWR) combines the BERT model and ELMo's deep context word representation.

The BERT comes from [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)

The ELMo comes from [Deep contextualized word representations](https://arxiv.org/abs/1802.05365v2)

# Basic usage method

## Download Pre-trained models

[BERT-Base, Uncased](https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip)

## Doenload [GLUE data](https://gluebenchmark.com/tasks)DATA

using this [script](https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e)

## Sentence (and sentence-pair) classification tasks

difference

```

export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12

export GLUE_DIR=/path/to/glue

python run_classifier_elmo.py \

  --task_name=MRPC \

  --do_train=true \

  --do_eval=true \

  --data_dir=$GLUE_DIR/MRPC \

  --vocab_file=$BERT_BASE_DIR/vocab.txt \

  --bert_config_file=$BERT_BASE_DIR/bert_config.json \

  --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \

  --max_seq_length=128 \

  --train_batch_size=32 \

  --learning_rate=2e-5 \

  --num_train_epochs=3.0 \

  --output_dir=/tmp/mrpc_output/

```

### Prediction from classifier

> the same as https://github.com/google-research/bert

```

export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12

export GLUE_DIR=/path/to/glue

export TRAINED_CLASSIFIER=/path/to/fine/tuned/classifier

python run_classifier_elmo.py \

  --task_name=MRPC \

  --do_predict=true \

  --data_dir=$GLUE_DIR/MRPC \

  --vocab_file=$BERT_BASE_DIR/vocab.txt \

  --bert_config_file=$BERT_BASE_DIR/bert_config.json \

  --init_checkpoint=$TRAINED_CLASSIFIER \

  --max_seq_length=128 \

  --output_dir=/tmp/mrpc_output/

```

more methods to [google-research/bert](https://github.com/google-research/bert)

## Solve [SQUAD1.1](https://rajpurkar.github.io/SQuAD-explorer/) problem

> the same as https://github.com/google-research/bert

difference

```

python run_squad_elmo.py --vocab_file=$BERT_BASE_DIR/vocab.txt --bert_config_file=$BERT_BASE_DIR/bert_config.json --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt --do_train=True --train_file=$SQUAD_DIR/train-v1.1.json --do_predict=True --predict_file=$SQUAD_DIR/dev-v1.1.json --train_batch_size=12 --learning_rate=3e-5 --num_train_epochs=2.0 --max_seq_length=384 --doc_stride=128 --output_dir=./tmp/elmo_squad_base/

```

## Experimental Result

```

python run_squad_elmo.py

{“exact_match”: 81.20151371807, “f1”: 88.56178500169332}

```