{"id":13592997,"url":"https://github.com/tryAGI/Tiktoken","last_synced_at":"2025-04-09T02:32:07.672Z","repository":{"id":173822430,"uuid":"651349718","full_name":"tryAGI/Tiktoken","owner":"tryAGI","description":"This project implements token calculation for OpenAI's gpt-4 and gpt-3.5-turbo model, specifically using `cl100k_base` encoding.","archived":false,"fork":false,"pushed_at":"2025-03-17T15:34:00.000Z","size":4008,"stargazers_count":74,"open_issues_count":9,"forks_count":4,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-31T19:21:18.796Z","etag":null,"topics":["ai","chatgpt","cl100kbase","csharp","encoding","gpt35turbo","gpt4","langchain","langchain-dotnet","openai","p50kbase","tiktoken","tiktoken-sharp","tokens"],"latest_commit_sha":null,"homepage":"https://github.com/openai/tiktoken","language":"C#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tryAGI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"HavenDV","patreon":"havendv","ko_fi":"havendv","custom":["https://www.paypal.me/havendv","https://www.buymeacoffee.com/havendv","https://donate.stripe.com/00gfZ19zkeKLh1eaEE","https://www.upwork.com/freelancers/~017b1ad6f6af9cc189"]}},"created_at":"2023-06-09T03:49:44.000Z","updated_at":"2025-03-31T13:12:42.000Z","dependencies_parsed_at":"2024-03-25T22:29:27.799Z","dependency_job_id":"5f601c75-ad69-4d46-8cfb-0146340a0007","html_url":"https://github.com/tryAGI/Tiktoken","commit_stats":null,"previous_names":["tryagi/tiktoken"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tryAGI%2FTiktoken","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tryAGI%2FTiktoken/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tryAGI%2FTiktoken/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tryAGI%2FTiktoken/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tryAGI","download_url":"https://codeload.github.com/tryAGI/Tiktoken/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247965627,"owners_count":21025408,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","chatgpt","cl100kbase","csharp","encoding","gpt35turbo","gpt4","langchain","langchain-dotnet","openai","p50kbase","tiktoken","tiktoken-sharp","tokens"],"created_at":"2024-08-01T16:01:15.491Z","updated_at":"2025-04-09T02:32:07.658Z","avatar_url":"https://github.com/tryAGI.png","language":"C#","funding_links":["https://github.com/sponsors/HavenDV","https://patreon.com/havendv","https://ko-fi.com/havendv","https://www.paypal.me/havendv","https://www.buymeacoffee.com/havendv","https://donate.stripe.com/00gfZ19zkeKLh1eaEE","https://www.upwork.com/freelancers/~017b1ad6f6af9cc189"],"categories":["C\\#","C#"],"sub_categories":[],"readme":"# Tiktoken\n\n[![Nuget package](https://img.shields.io/nuget/vpre/Tiktoken)](https://www.nuget.org/packages/Tiktoken/)\n[![dotnet](https://github.com/tryAGI/Tiktoken/actions/workflows/dotnet.yml/badge.svg?branch=main)](https://github.com/tryAGI/Tiktoken/actions/workflows/dotnet.yml)\n[![License: MIT](https://img.shields.io/github/license/tryAGI/Tiktoken)](https://github.com/tryAGI/Tiktoken/blob/main/LICENSE.txt)\n[![Discord](https://img.shields.io/discord/1115206893015662663?label=Discord\u0026logo=discord\u0026logoColor=white\u0026color=d82679)](https://discord.gg/Ca2xhfBf3v)\n\nThis implementation aims for maximum performance, especially in the token count operation.  \nThere's also a benchmark console app here for easy tracking of this.  \nWe will be happy to accept any PR.  \n\n### Implemented encodings\n- `o200k_base`\n- `cl100k_base`\n- `r50k_base`\n- `p50k_base`\n- `p50k_edit`\n\n### Usage\n```csharp\nusing Tiktoken;\n\nvar encoder = ModelToEncoder.For(\"gpt-4o\"); // or explicitly using new Encoder(new O200KBase())\nvar tokens = encoder.Encode(\"hello world\"); // [15339, 1917]\nvar text = encoder.Decode(tokens); // hello world\nvar numberOfTokens = encoder.CountTokens(text); // 2\nvar stringTokens = encoder.Explore(text); // [\"hello\", \" world\"]\n```\n\n### Benchmarks\nYou can view the reports for each version [here](benchmarks)\n\n\u003c!--BENCHMARKS_START--\u003e\n```\n\nBenchmarkDotNet v0.14.0, macOS Sequoia 15.1 (24B83) [Darwin 24.1.0]\nApple M1 Pro, 1 CPU, 10 logical and 10 physical cores\n.NET SDK 9.0.100\n  [Host]     : .NET 9.0.0 (9.0.24.52809), Arm64 RyuJIT AdvSIMD\n  DefaultJob : .NET 9.0.0 (9.0.24.52809), Arm64 RyuJIT AdvSIMD\n\n\n```\n| Method                            | Categories  | Data                | Mean         | Ratio | Gen0     | Gen1    | Allocated | Alloc Ratio |\n|---------------------------------- |------------ |-------------------- |-------------:|------:|---------:|--------:|----------:|------------:|\n| **SharpTokenV2_0_3_**                 | **CountTokens** | **1. (...)57. [19866]** | **567,130.0 ns** |  **1.00** |   **2.9297** |       **-** |   **20115 B** |        **1.00** |\n| TiktokenSharpV1_1_5_              | CountTokens | 1. (...)57. [19866] | 483,976.7 ns |  0.85 |  64.4531 |  5.8594 |  404648 B |       20.12 |\n| MicrosoftMLTokenizerV1_0_0_       | CountTokens | 1. (...)57. [19866] | 427,733.2 ns |  0.75 |        - |       - |     297 B |        0.01 |\n| TokenizerLibV1_3_3_               | CountTokens | 1. (...)57. [19866] | 773,467.5 ns |  1.36 | 246.0938 | 83.9844 | 1547675 B |       76.94 |\n| Tiktoken_                         | CountTokens | 1. (...)57. [19866] | 271,564.3 ns |  0.48 |  23.4375 |       - |  148313 B |        7.37 |\n|                                   |             |                     |              |       |          |         |           |             |\n| **SharpTokenV2_0_3_**                 | **CountTokens** | **Hello, World!**       |     **380.0 ns** |  **1.00** |   **0.0405** |       **-** |     **256 B** |        **1.00** |\n| TiktokenSharpV1_1_5_              | CountTokens | Hello, World!       |     263.8 ns |  0.69 |   0.0505 |       - |     320 B |        1.25 |\n| MicrosoftMLTokenizerV1_0_0_       | CountTokens | Hello, World!       |     305.7 ns |  0.80 |   0.0153 |       - |      96 B |        0.38 |\n| TokenizerLibV1_3_3_               | CountTokens | Hello, World!       |     509.6 ns |  1.34 |   0.2356 |  0.0010 |    1480 B |        5.78 |\n| Tiktoken_                         | CountTokens | Hello, World!       |     175.7 ns |  0.46 |   0.0191 |       - |     120 B |        0.47 |\n|                                   |             |                     |              |       |          |         |           |             |\n| **SharpTokenV2_0_3_**                 | **CountTokens** | **King(...)edy. [275]** |   **5,990.7 ns** |  **1.00** |   **0.0763** |       **-** |     **520 B** |        **1.00** |\n| TiktokenSharpV1_1_5_              | CountTokens | King(...)edy. [275] |   4,516.5 ns |  0.75 |   0.8011 |       - |    5064 B |        9.74 |\n| MicrosoftMLTokenizerV1_0_0_       | CountTokens | King(...)edy. [275] |   3,871.2 ns |  0.65 |   0.0153 |       - |      96 B |        0.18 |\n| TokenizerLibV1_3_3_               | CountTokens | King(...)edy. [275] |   7,465.8 ns |  1.25 |   3.0823 |  0.1373 |   19344 B |       37.20 |\n| Tiktoken_                         | CountTokens | King(...)edy. [275] |   2,744.5 ns |  0.46 |   0.3128 |       - |    1976 B |        3.80 |\n|                                   |             |                     |              |       |          |         |           |             |\n| **SharpTokenV2_0_3_Encode**           | **Encode**      | **1. (...)57. [19866]** | **568,150.3 ns** |  **1.00** |   **2.9297** |       **-** |   **20115 B** |        **1.00** |\n| TiktokenSharpV1_1_5_Encode        | Encode      | 1. (...)57. [19866] | 444,972.1 ns |  0.78 |  64.4531 |  5.8594 |  404649 B |       20.12 |\n| MicrosoftMLTokenizerV1_0_0_Encode | Encode      | 1. (...)57. [19866] | 410,970.9 ns |  0.72 |  10.2539 |  0.4883 |   66137 B |        3.29 |\n| TokenizerLibV1_3_3_Encode         | Encode      | 1. (...)57. [19866] | 770,068.9 ns |  1.36 | 246.0938 | 90.8203 | 1547675 B |       76.94 |\n| Tiktoken_Encode                   | Encode      | 1. (...)57. [19866] | 290,030.9 ns |  0.51 |  33.6914 |  1.4648 |  214465 B |       10.66 |\n|                                   |             |                     |              |       |          |         |           |             |\n| **SharpTokenV2_0_3_Encode**           | **Encode**      | **Hello, World!**       |     **381.2 ns** |  **1.00** |   **0.0405** |       **-** |     **256 B** |        **1.00** |\n| TiktokenSharpV1_1_5_Encode        | Encode      | Hello, World!       |     260.2 ns |  0.68 |   0.0505 |       - |     320 B |        1.25 |\n| MicrosoftMLTokenizerV1_0_0_Encode | Encode      | Hello, World!       |     325.1 ns |  0.85 |   0.0267 |       - |     168 B |        0.66 |\n| TokenizerLibV1_3_3_Encode         | Encode      | Hello, World!       |     511.6 ns |  1.34 |   0.2356 |       - |    1480 B |        5.78 |\n| Tiktoken_Encode                   | Encode      | Hello, World!       |     241.4 ns |  0.63 |   0.0801 |       - |     504 B |        1.97 |\n|                                   |             |                     |              |       |          |         |           |             |\n| **SharpTokenV2_0_3_Encode**           | **Encode**      | **King(...)edy. [275]** |   **5,957.3 ns** |  **1.00** |   **0.0763** |       **-** |     **520 B** |        **1.00** |\n| TiktokenSharpV1_1_5_Encode        | Encode      | King(...)edy. [275] |   4,523.8 ns |  0.76 |   0.8011 |       - |    5064 B |        9.74 |\n| MicrosoftMLTokenizerV1_0_0_Encode | Encode      | King(...)edy. [275] |   4,069.8 ns |  0.68 |   0.1144 |       - |     744 B |        1.43 |\n| TokenizerLibV1_3_3_Encode         | Encode      | King(...)edy. [275] |   7,207.8 ns |  1.21 |   3.0823 |  0.1373 |   19344 B |       37.20 |\n| Tiktoken_Encode                   | Encode      | King(...)edy. [275] |   2,945.7 ns |  0.49 |   0.4654 |       - |    2936 B |        5.65 |\n\n\u003c!--BENCHMARKS_END--\u003e\n\n## Support\n\nPriority place for bugs: https://github.com/tryAGI/LangChain/issues  \nPriority place for ideas and general questions: https://github.com/tryAGI/LangChain/discussions  \nDiscord: https://discord.gg/Ca2xhfBf3v  ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FtryAGI%2FTiktoken","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FtryAGI%2FTiktoken","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FtryAGI%2FTiktoken/lists"}