https://github.com/capjamesg/bpe
Byte-pair encoding implementation in Python.
https://github.com/capjamesg/bpe
byte-pair-encoding text-encoding
Last synced: 11 months ago
JSON representation
Byte-pair encoding implementation in Python.
- Host: GitHub
- URL: https://github.com/capjamesg/bpe
- Owner: capjamesg
- License: mit
- Created: 2023-12-25T20:45:05.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-12-27T22:23:44.000Z (about 2 years ago)
- Last Synced: 2025-03-28T19:16:37.089Z (11 months ago)
- Topics: byte-pair-encoding, text-encoding
- Language: Python
- Homepage:
- Size: 1.95 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# BPE
An implementation of byte-pair encoding in Python.
> [!WARNING]
> I wrote this project to learn about byte-pair encoding. This project is not intended for production use.
## Setup
To set up this project, first clone the project and install the project from source:
```
git clone https://github.com/capjamesg/bpe
cd bpe/
pip3 install -e .
```
### Test the encoder
Open `bpe.py`. Update the `text` variable to be the text you want to use to train your byte-pair encoder.
Then, update the `input_seq` value toward the end of `bpe.py` to include some demo text to encode using your encoder.
Run the script to test the encoder.
> [!NOTE]
> This script doesn't save your encoding. I recommend adding your own saving logic if you want to use the encoder on large text samples.
## License
This project is licensed under an [MIT license](LICENSE).