An open API service indexing awesome lists of open source software.

https://github.com/julienkay/com.doji.sentencepiece

A Unity package for SentencePiece tokenization
https://github.com/julienkay/com.doji.sentencepiece

Last synced: 3 months ago
JSON representation

A Unity package for SentencePiece tokenization

Awesome Lists containing this project

README

        


doji logo

# SentencePiece

[OpenUPM]

A Unity package for [SentencePiece] tokenization.

## About

This package contains the .dll for [Microsoft.ML.Tokenizer] which is part of [ML.NET] as well as the following dependencies:
- Google.Protobuf
- Microsoft.Bcl.AsyncInterfaces
- System.Runtime.CompilerServices.Unsafe
- System.Text.Encodings.Web
- System.Text.Json

---

The main use I have for this is to implement specific tokenizers that rely on SentencePiece (like LLama, T5, ...) as part of the [com.doji.transformers] package.

[OpenUPM]: https://openupm.com/packages/com.doji.sentencepiece
[SentencePiece]: https://github.com/google/sentencepiece
[Microsoft.ML.Tokenizer]: https://github.com/dotnet/machinelearning/tree/main/src/Microsoft.ML.Tokenizers
[ML.NET]: https://dotnet.microsoft.com/en-us/apps/machinelearning-ai/ml-dotnet
[com.doji.transformers]: https://github.com/julienkay/com.doji.transformers