https://github.com/simonepri/text-tokenizers-colab
🔪 Tokenize text on the fly on Colab.
https://github.com/simonepri/text-tokenizers-colab
colab-notebook machine-learning text tokenization
Last synced: 9 months ago
JSON representation
🔪 Tokenize text on the fly on Colab.
- Host: GitHub
- URL: https://github.com/simonepri/text-tokenizers-colab
- Owner: simonepri
- License: mit
- Created: 2020-04-15T14:41:14.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2020-04-18T12:33:34.000Z (about 6 years ago)
- Last Synced: 2025-03-29T17:24:54.449Z (over 1 year ago)
- Topics: colab-notebook, machine-learning, text, tokenization
- Language: Jupyter Notebook
- Size: 23.4 KB
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
- License: license
Awesome Lists containing this project
README
text-tokenizers-colab
🔪 Tokenize text on the fly on Colab.
## Synopsis
Tokenization is the task of splitting a text into meaningful segments, called tokens.
This repository contains python notebooks to run some text tokenizers for quick experimentation purposes.
Just click on one of the links in the list below and run the notebook.
Do you believe that this is *useful*?
Has it *saved you time*?
Or maybe you simply *like it*?
If so, [support this work with a Star ⭐️][start].
## Notebooks
- Hugging Face's Transformers Library Tokenizers - [][colab:transformers]
- Explosion AI spaCy Library Tokenizers - [][colab:spacy]
## Authors
- **Simone Primarosa** - [simonepri][github:simonepri]
See also the list of [contributors][contributors] who participated in this project.
## License
This project is licensed under the MIT License - see the [license][license] file for details.
[start]: https://github.com/simonepri/text-tokenizers-colab#start-of-content
[license]: https://github.com/simonepri/text-tokenizers-colab/tree/master/license
[contributors]: https://github.com/simonepri/text-tokenizers-colab/contributors
[github:simonepri]: https://github.com/simonepri
[colab:transformers]: https://colab.research.google.com/github/simonepri/text-tokenizers-colab/blob/master/transformers-tokenizers.ipynb
[colab:spacy]: https://colab.research.google.com/github/simonepri/text-tokenizers-colab/blob/master/spacy-tokenizers.ipynb