An open API service indexing awesome lists of open source software.

https://github.com/simonepri/text-tokenizers-colab

🔪 Tokenize text on the fly on Colab.
https://github.com/simonepri/text-tokenizers-colab

colab-notebook machine-learning text tokenization

Last synced: 9 months ago
JSON representation

🔪 Tokenize text on the fly on Colab.

Awesome Lists containing this project

README

          


text-tokenizers-colab



🔪 Tokenize text on the fly on Colab.

## Synopsis

Tokenization is the task of splitting a text into meaningful segments, called tokens.
This repository contains python notebooks to run some text tokenizers for quick experimentation purposes.
Just click on one of the links in the list below and run the notebook.

Do you believe that this is *useful*?
Has it *saved you time*?
Or maybe you simply *like it*?
If so, [support this work with a Star ⭐️][start].

## Notebooks
- Hugging Face's Transformers Library Tokenizers - [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)][colab:transformers]
- Explosion AI spaCy Library Tokenizers - [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)][colab:spacy]

## Authors

- **Simone Primarosa** - [simonepri][github:simonepri]

See also the list of [contributors][contributors] who participated in this project.

## License

This project is licensed under the MIT License - see the [license][license] file for details.

[start]: https://github.com/simonepri/text-tokenizers-colab#start-of-content
[license]: https://github.com/simonepri/text-tokenizers-colab/tree/master/license
[contributors]: https://github.com/simonepri/text-tokenizers-colab/contributors

[github:simonepri]: https://github.com/simonepri

[colab:transformers]: https://colab.research.google.com/github/simonepri/text-tokenizers-colab/blob/master/transformers-tokenizers.ipynb
[colab:spacy]: https://colab.research.google.com/github/simonepri/text-tokenizers-colab/blob/master/spacy-tokenizers.ipynb