https://github.com/diixo/bert-from-scratch
https://github.com/diixo/bert-from-scratch
bert bert-embeddings bert-model
Last synced: 8 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/diixo/bert-from-scratch
- Owner: diixo
- Created: 2025-06-22T02:02:26.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2025-07-14T22:48:16.000Z (11 months ago)
- Last Synced: 2025-07-15T02:12:50.848Z (11 months ago)
- Topics: bert, bert-embeddings, bert-model
- Language: Python
- Homepage:
- Size: 3.75 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# BERT-example
Train text from: https://archive.org/stream/StructureAndInterpretationOfComputerProgramsSecondEdition/sicp_djvu.txt
* **demo-1.py** - simple demo with training on small text
* **demo-synthesis.py** - create `wordpiece-tokenizer-base.json` with synthetic tokens.
* **demo-2.py** - `tokens.txt` as external
* **demo-3.py** - `tokens.txt` with tokenization training corpus