An open API service indexing awesome lists of open source software.

https://github.com/diixo/bert-from-scratch


https://github.com/diixo/bert-from-scratch

bert bert-embeddings bert-model

Last synced: 8 months ago
JSON representation

Awesome Lists containing this project

README

          

# BERT-example

Train text from: https://archive.org/stream/StructureAndInterpretationOfComputerProgramsSecondEdition/sicp_djvu.txt

* **demo-1.py** - simple demo with training on small text
* **demo-synthesis.py** - create `wordpiece-tokenizer-base.json` with synthetic tokens.

* **demo-2.py** - `tokens.txt` as external
* **demo-3.py** - `tokens.txt` with tokenization training corpus