Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/xu-song/bert-as-language-model
BERT as language model, fork from https://github.com/google-research/bert
https://github.com/xu-song/bert-as-language-model
bert language-model tensorflow
Last synced: 3 months ago
JSON representation
BERT as language model, fork from https://github.com/google-research/bert
- Host: GitHub
- URL: https://github.com/xu-song/bert-as-language-model
- Owner: xu-song
- License: apache-2.0
- Created: 2018-11-30T03:43:38.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2024-03-06T06:20:47.000Z (11 months ago)
- Last Synced: 2024-08-23T02:10:07.862Z (5 months ago)
- Topics: bert, language-model, tensorflow
- Language: Python
- Homepage:
- Size: 175 KB
- Stars: 247
- Watchers: 9
- Forks: 68
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
- awesome-bert - xu-song/bert_as_language_model - research/bert, (BERT language model and embedding:)
README
**[🤗Demo](#demo)** |
**[📖cases-en](#test-case)** |
**[📖cases-zh](cases/test.zh.md)** |## BERT as Language Model
For a sentence , we have
In traditional language model, such as RNN, ,
In bidirectional language model, it has larger context, .
In this implementation, we simply adopt the following approximation,
.
### Demo
Try out the [Web Demo](https://huggingface.co/spaces/eson/bert-perplexity) at [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/eson/bert-perplexity)
### test-case
> [more cases: ä¸æ–‡](cases/test.zh.md)
```bash
export BERT_BASE_DIR=model/uncased_L-12_H-768_A-12
export INPUT_FILE=data/lm/test.en.tsv
python run_lm_predict.py \
--input_file=$INPUT_FILE \
--vocab_file=$BERT_BASE_DIR/vocab.txt \
--bert_config_file=$BERT_BASE_DIR/bert_config.json \
--init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
--max_seq_length=128 \
--output_dir=/tmp/lm_output/
```for the following test case
```bash
$ cat data/lm/test.en.tsv
there is a book on the desk
there is a plane on the desk
there is a book in the desk$ cat /tmp/lm/output/test_result.json
```
output:```yml
# prob: probability
# ppl: perplexity
[
{
"tokens": [
{
"token": "there",
"prob": 0.9988962411880493
},
{
"token": "is",
"prob": 0.013578361831605434
},
{
"token": "a",
"prob": 0.9420605897903442
},
{
"token": "book",
"prob": 0.07452250272035599
},
{
"token": "on",
"prob": 0.9607976675033569
},
{
"token": "the",
"prob": 0.4983428418636322
},
{
"token": "desk",
"prob": 4.040586190967588e-06
}
],
"ppl": 17.69329728285426
},
{
"tokens": [
{
"token": "there",
"prob": 0.996775209903717
},
{
"token": "is",
"prob": 0.03194097802042961
},
{
"token": "a",
"prob": 0.8877727389335632
},
{
"token": "plane",
"prob": 3.4907534427475184e-05 # low probability
},
{
"token": "on",
"prob": 0.1902322769165039
},
{
"token": "the",
"prob": 0.5981084704399109
},
{
"token": "desk",
"prob": 3.3164762953674654e-06
}
],
"ppl": 59.646456254851806
},
{
"tokens": [
{
"token": "there",
"prob": 0.9969795942306519
},
{
"token": "is",
"prob": 0.03379646688699722
},
{
"token": "a",
"prob": 0.9095568060874939
},
{
"token": "book",
"prob": 0.013939591124653816
},
{
"token": "in",
"prob": 0.000823647016659379 # low probability
},
{
"token": "the",
"prob": 0.5844194293022156
},
{
"token": "desk",
"prob": 3.3361218356731115e-06
}
],
"ppl": 54.65941516205144
}
]
```