{"id":15012506,"url":"https://github.com/microsoft/tokenizer","last_synced_at":"2025-05-15T04:04:31.091Z","repository":{"id":151681970,"uuid":"620176227","full_name":"microsoft/Tokenizer","owner":"microsoft","description":"Typescript and .NET implementation of BPE tokenizer for OpenAI LLMs.","archived":false,"fork":false,"pushed_at":"2025-04-25T18:16:23.000Z","size":2079,"stargazers_count":189,"open_issues_count":5,"forks_count":35,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-05-15T00:08:49.989Z","etag":null,"topics":["ai","gpt","llm","openai","tokenizer"],"latest_commit_sha":null,"homepage":"https://github.com/microsoft/Tokenizer/","language":"C#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/microsoft.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":"SUPPORT.md","governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-28T07:14:39.000Z","updated_at":"2025-05-14T21:18:44.000Z","dependencies_parsed_at":"2024-01-11T06:26:28.771Z","dependency_job_id":"5842d478-120f-4a33-bbcd-27bb54841202","html_url":"https://github.com/microsoft/Tokenizer","commit_stats":{"total_commits":73,"total_committers":15,"mean_commits":4.866666666666666,"dds":0.4657534246575342,"last_synced_commit":"0ac92ff362b58bc60699e7cc862865bd844430f1"},"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FTokenizer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FTokenizer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FTokenizer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FTokenizer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/microsoft","download_url":"https://codeload.github.com/microsoft/Tokenizer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254270641,"owners_count":22042858,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","gpt","llm","openai","tokenizer"],"created_at":"2024-09-24T19:42:45.859Z","updated_at":"2025-05-15T04:04:31.045Z","avatar_url":"https://github.com/microsoft.png","language":"C#","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Tokenizer\n\nThis repo contains Typescript and C# implementation of byte pair encoding(BPE) tokenizer for OpenAI LLMs, it's based on open sourced rust implementation in the [OpenAI tiktoken](https://github.com/openai/tiktoken). Both implementation are valuable to run prompt tokenization in Nodejs and .NET environment before feeding prompt into a LLM.\n\n## Typescript implementation\n\nPlease follow [README](tokenizer_ts/README.md).\n\n## C# implementation\n\n   \u003e [!IMPORTANT]\n   \u003e Users of `Microsoft.DeepDev.TokenizerLib` should migrate to `Microsoft.ML.Tokenizers`. The functionality in `Microsoft.DeepDev.TokenizerLib` has been added to [`Microsoft.ML.Tokenizers`](https://www.nuget.org/packages/Microsoft.ML.Tokenizers). `Microsoft.ML.Tokenizers` is a tokenizer library being developed by the .NET team and going forward, the central place for tokenizer development in .NET. By using `Microsoft.ML.Tokenizers`, you should see improved performance over existing tokenizer library implementations, including `Microsoft.DeepDev.TokenizerLib`. A stable release of `Microsoft.ML.Tokenizers` is expected alongside the .NET 9.0 release (November 2024). Instructions for migration can be found at https://github.com/dotnet/machinelearning/blob/main/docs/code/microsoft-ml-tokenizers-migration-guide.md.\n\n## Contributing\n\nWe welcome contributions. Please follow [this guideline](CONTRIBUTING.md).\n\n## Trademarks\n\nThis project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft \ntrademarks or logos is subject to and must follow \n[Microsoft's Trademark \u0026 Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).\nUse of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.\nAny use of third-party trademarks or logos are subject to those third-party's policies.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2Ftokenizer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmicrosoft%2Ftokenizer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2Ftokenizer/lists"}