Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jmoney/tokenizer-utils
Huggingface compatible tokenizers as a command-line, http server, or AWS Lambda
https://github.com/jmoney/tokenizer-utils
github-site homebrew-formula license-management mkdocs
Last synced: 22 days ago
JSON representation
Huggingface compatible tokenizers as a command-line, http server, or AWS Lambda
- Host: GitHub
- URL: https://github.com/jmoney/tokenizer-utils
- Owner: jmoney
- License: apache-2.0
- Created: 2024-05-31T21:15:42.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-06-13T15:54:09.000Z (7 months ago)
- Last Synced: 2024-12-18T19:53:31.827Z (22 days ago)
- Topics: github-site, homebrew-formula, license-management, mkdocs
- Language: Go
- Homepage: https://www.jmoney.dev/tokenizer-utils/
- Size: 491 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
# Tokenizer Utils
Very simple set of utilities to tokenize a string using hugging face bindings. The underlying library for the bindings are [daulet/tokenizers](https://github.com/daulet/tokenizers/).
## Tokenizer CLI
```bash
tokenizer -h
Usage of tokenizer:
-add_special_tokens
Add special tokens
-model string
The path to the model
```The CLI is a simple command line interface that tokenizes a string using hugging face bindings. It reads from STDIN.
## Tokenizer Lambda
This is a Lambda function that tokenizes a string using hugging face bindings. It is meant to be fronted by an AWS Application Load Balancer.
## Tokenizer HTTP Server
This is a simple HTTP server that tokenizes a string using hugging face bindings. It is a standalone server that can be run locally or in a container.