https://github.com/julienkay/com.doji.sentencepiece
A Unity package for SentencePiece tokenization
https://github.com/julienkay/com.doji.sentencepiece
Last synced: 3 months ago
JSON representation
A Unity package for SentencePiece tokenization
- Host: GitHub
- URL: https://github.com/julienkay/com.doji.sentencepiece
- Owner: julienkay
- License: mit
- Created: 2024-07-26T10:29:01.000Z (10 months ago)
- Default Branch: master
- Last Pushed: 2024-08-16T16:06:01.000Z (10 months ago)
- Last Synced: 2025-03-09T21:43:38.600Z (3 months ago)
- Language: C#
- Size: 2.84 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# SentencePiece
[OpenUPM]
A Unity package for [SentencePiece] tokenization.
## About
This package contains the .dll for [Microsoft.ML.Tokenizer] which is part of [ML.NET] as well as the following dependencies:
- Google.Protobuf
- Microsoft.Bcl.AsyncInterfaces
- System.Runtime.CompilerServices.Unsafe
- System.Text.Encodings.Web
- System.Text.Json---
The main use I have for this is to implement specific tokenizers that rely on SentencePiece (like LLama, T5, ...) as part of the [com.doji.transformers] package.
[OpenUPM]: https://openupm.com/packages/com.doji.sentencepiece
[SentencePiece]: https://github.com/google/sentencepiece
[Microsoft.ML.Tokenizer]: https://github.com/dotnet/machinelearning/tree/main/src/Microsoft.ML.Tokenizers
[ML.NET]: https://dotnet.microsoft.com/en-us/apps/machinelearning-ai/ml-dotnet
[com.doji.transformers]: https://github.com/julienkay/com.doji.transformers