{"id":13466166,"url":"https://github.com/pkoukk/tiktoken-go","last_synced_at":"2026-01-18T02:49:41.504Z","repository":{"id":139707998,"uuid":"611200916","full_name":"pkoukk/tiktoken-go","owner":"pkoukk","description":"go version of tiktoken","archived":false,"fork":false,"pushed_at":"2024-05-21T09:12:30.000Z","size":1207,"stargazers_count":663,"open_issues_count":9,"forks_count":74,"subscribers_count":6,"default_branch":"main","last_synced_at":"2024-10-29T19:17:01.751Z","etag":null,"topics":["chatgpt","go","golang","gpt-35-turbo","gpt-4","openai","tiktoken"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pkoukk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-08T10:31:35.000Z","updated_at":"2024-10-29T14:28:56.000Z","dependencies_parsed_at":"2024-01-13T17:59:37.469Z","dependency_job_id":"a2b1823a-47ba-4bc4-b422-558128a1dfda","html_url":"https://github.com/pkoukk/tiktoken-go","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pkoukk%2Ftiktoken-go","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pkoukk%2Ftiktoken-go/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pkoukk%2Ftiktoken-go/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pkoukk%2Ftiktoken-go/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pkoukk","download_url":"https://codeload.github.com/pkoukk/tiktoken-go/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245547346,"owners_count":20633349,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chatgpt","go","golang","gpt-35-turbo","gpt-4","openai","tiktoken"],"created_at":"2024-07-31T15:00:40.173Z","updated_at":"2026-01-18T02:49:41.480Z","avatar_url":"https://github.com/pkoukk.png","language":"Go","funding_links":[],"categories":["Go","Openai","SDK, Libraries, Frameworks","Calculators and Estimators","Other libraries"],"sub_categories":["Golang","Tokenizers"],"readme":"# tiktoken-go\n\n[简体中文](./README_zh-hans.md)\n\nOpenAI's tiktoken in Go.\n\nTiktoken is a fast BPE tokeniser for use with OpenAI's models.\n\nThis is a port of the original [tiktoken](https://github.com/openai/tiktoken).\n\n# Usage\n\n## Install\n\n```bash\ngo get github.com/pkoukk/tiktoken-go\n```\n\n## Cache\n\nTiktoken-go has the same cache mechanism as the original Tiktoken library.\n\nYou can set the cache directory by using the environment variable TIKTOKEN_CACHE_DIR.\n\nOnce this variable is set, tiktoken-go will use this directory to cache the token dictionary.\n\nIf you don't set this environment variable, tiktoken-go will download the dictionary each time you initialize an\nencoding for the first time.\n\n## Alternative BPE loaders\n\nIf you don't want to use cache or download the dictionary each time, you can use alternative BPE loader.\n\nJust call `tiktoken.SetBpeLoader` before calling `tiktoken.GetEncoding` or `tiktoken.EncodingForModel`.\n\n`BpeLoader` is an interface, you can implement your own BPE loader by implementing this interface.\n\n### Offline BPE loader\n\nThe offline BPE loader loads the BPE dictionary from embed files, it helps if you don't want to download the dictionary\nat runtime.\n\nDue to the size of the BPE dictionary, this loader is in other project.\n\nInclude if you require this loader: [tiktoken_loader](https://github.com/pkoukk/tiktoken-go-loader)\n\n## Examples\n\n### Get Token By Encoding\n\n```go\npackage main\n\nimport (\n\t\"fmt\"\n\t\"github.com/pkoukk/tiktoken-go\"\n)\n\nfunc main() {\n\ttext := \"Hello, world!\"\n\tencoding := \"cl100k_base\"\n\n\t// if you don't want download dictionary at runtime, you can use offline loader\n\t// tiktoken.SetBpeLoader(tiktoken_loader.NewOfflineLoader())\n\ttke, err := tiktoken.GetEncoding(encoding)\n\tif err != nil {\n\t\terr = fmt.Errorf(\"getEncoding: %v\", err)\n\t\treturn\n\t}\n\n\t// encode\n\ttoken := tke.Encode(text, nil, nil)\n\n\t//tokens\n\tfmt.Println((token))\n\t// num_tokens\n\tfmt.Println(len(token))\n}\n```\n\n### Get Token By Model\n\n```go\npackage main\n\nimport (\n\t\"fmt\"\n\t\"github.com/pkoukk/tiktoken-go\"\n)\n\nfunc main() {\n\ttext := \"Hello, world!\"\n\tencoding := \"gpt-3.5-turbo\"\n\n\ttkm, err := tiktoken.EncodingForModel(encoding)\n\tif err != nil {\n\t\terr = fmt.Errorf(\"getEncoding: %v\", err)\n\t\treturn\n\t}\n\n\t// encode\n\ttoken := tkm.Encode(text, nil, nil)\n\n\t// tokens\n\tfmt.Println(token)\n\t// num_tokens\n\tfmt.Println(len(token))\n}\n```\n\n### Counting Tokens For Chat API Calls\n\nBelow is an example function for counting tokens for messages passed to gpt-3.5-turbo or gpt-4.\n\nThe following code was written based\non [openai-cookbook](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb)\nexamples at `Wednesday, 28 June 2023`.\n\nPlease note that the token calculation method for the message may change at any time, so this code may not necessarily\nbe applicable in the future.\n\nIf you need accurate calculation, please refer to the official documentation.\n\nIf you find that this code is no longer applicable, please feel free to submit a PR or Issue.\n\n```go\npackage main\n\nimport (\n\t\"fmt\"\n\n\t\"github.com/pkoukk/tiktoken-go\"\n\t\"github.com/sashabaranov/go-openai\"\n)\n\n// OpenAI Cookbook: https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb\nfunc NumTokensFromMessages(messages []openai.ChatCompletionMessage, model string) (numTokens int) {\n\ttkm, err := tiktoken.EncodingForModel(model)\n\tif err != nil {\n\t\terr = fmt.Errorf(\"encoding for model: %v\", err)\n\t\tlog.Println(err)\n\t\treturn\n\t}\n\n\tvar tokensPerMessage, tokensPerName int\n\tswitch model {\n\tcase \"gpt-3.5-turbo-0613\",\n\t\t\"gpt-3.5-turbo-16k-0613\",\n\t\t\"gpt-4-0314\",\n\t\t\"gpt-4-32k-0314\",\n\t\t\"gpt-4-0613\",\n\t\t\"gpt-4-32k-0613\":\n\t\ttokensPerMessage = 3\n\t\ttokensPerName = 1\n\tcase \"gpt-3.5-turbo-0301\":\n\t\ttokensPerMessage = 4 // every message follows \u003c|start|\u003e{role/name}\\n{content}\u003c|end|\u003e\\n\n\t\ttokensPerName = -1   // if there's a name, the role is omitted\n\tdefault:\n\t\tif strings.Contains(model, \"gpt-3.5-turbo\") {\n\t\t\tlog.Println(\"warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.\")\n\t\t\treturn NumTokensFromMessages(messages, \"gpt-3.5-turbo-0613\")\n\t\t} else if strings.Contains(model, \"gpt-4\") {\n\t\t\tlog.Println(\"warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.\")\n\t\t\treturn NumTokensFromMessages(messages, \"gpt-4-0613\")\n\t\t} else {\n\t\t\terr = fmt.Errorf(\"num_tokens_from_messages() is not implemented for model %s. See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens.\", model)\n\t\t\tlog.Println(err)\n\t\t\treturn\n\t\t}\n\t}\n\n\tfor _, message := range messages {\n\t\tnumTokens += tokensPerMessage\n\t\tnumTokens += len(tkm.Encode(message.Content, nil, nil))\n\t\tnumTokens += len(tkm.Encode(message.Role, nil, nil))\n\t\tnumTokens += len(tkm.Encode(message.Name, nil, nil))\n\t\tif message.Name != \"\" {\n\t\t\tnumTokens += tokensPerName\n\t\t}\n\t}\n\tnumTokens += 3 // every reply is primed with \u003c|start|\u003eassistant\u003c|message|\u003e\n\treturn numTokens\n}\n\n```\n\n# Available Encodings\n\n| Encoding name           | OpenAI models                                                                                          |\n|-------------------------|--------------------------------------------------------------------------------------------------------|\n| `o200k_base`            | `gpt-4o`, `gpt-4.1`, `gpt-4.5`                                                                         |\n| `cl100k_base`           | `gpt-4`, `gpt-3.5-turbo`, `text-embedding-ada-002`, `text-embedding-3-small`, `text-embedding-3-large` |\n| `p50k_base`             | Codex models, `text-davinci-002`, `text-davinci-003`                                                   |\n| `r50k_base` (or `gpt2`) | GPT-3 models like `davinci`                                                                            |\n\n# Available Models\n\n| Model name                   | OpenAI models |\n|------------------------------|---------------|\n| gpt-4.5-*                    | o200k_base    |\n| gpt-4.1-*                    | o200k_base    |\n| gpt-4o-*                     | o200k_base    |\n| gpt-4-*                      | cl100k_base   |\n| gpt-3.5-turbo-*              | cl100k_base   |\n| gpt-4o                       | o200k_base    |\n| gpt-4                        | cl100k_base   |\n| gpt-3.5-turbo                | cl100k_base   |\n| text-davinci-003             | p50k_base     |\n| text-davinci-002             | p50k_base     |\n| text-davinci-001             | r50k_base     |\n| text-curie-001               | r50k_base     |\n| text-babbage-001             | r50k_base     |\n| text-ada-001                 | r50k_base     |\n| davinci                      | r50k_base     |\n| curie                        | r50k_base     |\n| babbage                      | r50k_base     |\n| ada                          | r50k_base     |\n| code-davinci-002             | p50k_base     |\n| code-davinci-001             | p50k_base     |\n| code-cushman-002             | p50k_base     |\n| code-cushman-001             | p50k_base     |\n| davinci-codex                | p50k_base     |\n| cushman-codex                | p50k_base     |\n| text-davinci-edit-001        | p50k_edit     |\n| code-davinci-edit-001        | p50k_edit     |\n| text-embedding-ada-002       | cl100k_base   |\n| text-embedding-3-small       | cl100k_base   |\n| text-embedding-3-large       | cl100k_base   |\n| text-similarity-davinci-001  | r50k_base     |\n| text-similarity-curie-001    | r50k_base     |\n| text-similarity-babbage-001  | r50k_base     |\n| text-similarity-ada-001      | r50k_base     |\n| text-search-davinci-doc-001  | r50k_base     |\n| text-search-curie-doc-001    | r50k_base     |\n| text-search-babbage-doc-001  | r50k_base     |\n| text-search-ada-doc-001      | r50k_base     |\n| code-search-babbage-code-001 | r50k_base     |\n| code-search-ada-code-001     | r50k_base     |\n| gpt2                         | gpt2          |\n\n# Test\n\n\u003e you can run test in [test](./test) folder\n\n## compare with original [tiktoken](https://github.com/openai/tiktoken)\n\n## get token by encoding\n\n[result](./doc/test_result.md#encoding-test-result)\n\n## get token by model\n\n[result](./doc/test_result.md#model-test-result)\n\n# Benchmark\n\n\u003e you can run benchmark in [test](./test) folder\n\n## Benchmark result\n\n| name        | time/op | os         | cpu      | text                             | times  |\n|-------------|---------|------------|----------|----------------------------------|--------|\n| tiktoken-go | 8795ns  | macOS 13.2 | Apple M1 | [UDHR](https://unicode.org/udhr) | 100000 |\n| tiktoken    | 8838ns  | macOS 13.2 | Apple M1 | [UDHR](https://unicode.org/udhr) | 100000 |\n\nIt looks like the performance is almost the same.\n\nMaybe the difference is due to the difference in the performance of the machine.\n\nOr maybe my benchmark method is not appropriate.\n\nIf you have better benchmark method or if you want add your benchmark result, please feel free to submit a PR.\n\nFor new `o200k_base` encoding, it seems slower than `cl100k_base`. tiktoken-go is slightly slower than tiktoken on the\nfollowing benchmark.\n\n| name        | encoding    | time/op   | os           | cpu                | text                                                | times  |\n|-------------|-------------|-----------|--------------|--------------------|-----------------------------------------------------|--------|\n| tiktoken-go | o200k_base  | 108522 ns | Ubuntu 22.04 | AMD Ryzen 9 5900HS | [UDHR](http://research.ics.aalto.fi/cog/data/udhr/) | 100000 |\n| tiktoken    | o200k_base  | 70198 ns  | Ubuntu 22.04 | AMD Ryzen 9 5900HS | [UDHR](http://research.ics.aalto.fi/cog/data/udhr/) | 100000 |\n| tiktoken-go | cl100k_base | 94502 ns  | Ubuntu 22.04 | AMD Ryzen 9 5900HS | [UDHR](http://research.ics.aalto.fi/cog/data/udhr/) | 100000 |\n| tiktoken    | cl100k_base | 54642 ns  | Ubuntu 22.04 | AMD Ryzen 9 5900HS | [UDHR](http://research.ics.aalto.fi/cog/data/udhr/) | 100000 |\n\n# License\n\n[MIT](./LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpkoukk%2Ftiktoken-go","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpkoukk%2Ftiktoken-go","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpkoukk%2Ftiktoken-go/lists"}