{"id":15677652,"url":"https://github.com/johannschopplich/tokenx","last_synced_at":"2025-05-01T02:32:26.305Z","repository":{"id":209504656,"uuid":"724065233","full_name":"johannschopplich/tokenx","owner":"johannschopplich","description":"📐 GPT token estimation and context size utilities without a full tokenizer","archived":false,"fork":false,"pushed_at":"2024-11-29T11:43:09.000Z","size":361,"stargazers_count":21,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-26T23:44:34.559Z","etag":null,"topics":["tiktoken","token-counter","tokenization","tokenizer"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/johannschopplich.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-11-27T10:24:11.000Z","updated_at":"2025-04-03T05:49:59.000Z","dependencies_parsed_at":"2024-01-30T12:46:38.773Z","dependency_job_id":"623f0bd5-2c54-46c7-87c8-a55fa6c3c686","html_url":"https://github.com/johannschopplich/tokenx","commit_stats":null,"previous_names":["johannschopplich/tokenwise","johannschopplich/tokenx"],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/johannschopplich%2Ftokenx","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/johannschopplich%2Ftokenx/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/johannschopplich%2Ftokenx/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/johannschopplich%2Ftokenx/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/johannschopplich","download_url":"https://codeload.github.com/johannschopplich/tokenx/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251812456,"owners_count":21647911,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["tiktoken","token-counter","tokenization","tokenizer"],"created_at":"2024-10-03T16:10:10.980Z","updated_at":"2025-05-01T02:32:26.288Z","avatar_url":"https://github.com/johannschopplich.png","language":"TypeScript","funding_links":[],"categories":["TypeScript"],"sub_categories":[],"readme":"# tokenx\n\nGPT token count and context size utilities when approximations are good enough. For advanced use cases, please use a full tokenizer like [`gpt-tokenizer`](https://github.com/niieani/gpt-tokenizer). This library is intended to be used for quick estimations and to avoid the overhead of a full tokenizer, e.g. when you want to limit your bundle size.\n\n## Benchmarks\n\nThe following table shows the accuracy of the token count approximation for different input texts:\n\n\u003c!-- START GENERATED TOKEN COUNT TABLE --\u003e\n| Description | Actual GPT Token Count | Estimated Token Count | Token Count Deviation |\n| --- | --- | --- | --- |\n| Short English text | 10 | 11 | 10.00% |\n| German text with umlauts | 56 | 49 | 12.50% |\n| Metamorphosis by Franz Kafka (English) | 31892 | 33930 | 6.39% |\n| Die Verwandlung by Franz Kafka (German) | 40621 | 34908 | 14.06% |\n| 道德經 by Laozi (Chinese) | 14387 | 11919 | 17.15% |\n| TypeScript ES5 Type Declarations (~ 4000 loc) | 48408 | 51688 | 6.78% |\n\u003c!-- END GENERATED TOKEN COUNT TABLE --\u003e\n\n## Features\n\n- 🌁 Estimate token count without a full tokenizer\n- 📐 Supports multiple model context sizes\n- 🗣️ Supports accented characters, like German umlauts or French accents\n- 🪽 Zero dependencies\n\n## Installation\n\nRun the following command to add `tokenx` to your project.\n\n```bash\n# npm\nnpm install tokenx\n\n# pnpm\npnpm add tokenx\n\n# yarn\nyarn add tokenx\n```\n\n## Usage\n\n```ts\nimport {\n  approximateMaxTokenSize,\n  approximateTokenSize,\n  isWithinTokenLimit\n} from 'tokenx'\n\nconst prompt = 'Your prompt goes here.'\nconst inputText = 'Your text goes here.'\n\n// Estimate the number of tokens in the input text\nconst estimatedTokens = approximateTokenSize(inputText)\nconsole.log(`Estimated token count: ${estimatedTokens}`)\n\n// Calculate the maximum number of tokens allowed for a given model\nconst modelName = 'gpt-3.5-turbo'\nconst maxResponseTokens = 1000\nconst availableTokens = approximateMaxTokenSize({\n  prompt,\n  modelName,\n  maxTokensInResponse: maxResponseTokens\n})\nconsole.log(`Available tokens for model ${modelName}: ${availableTokens}`)\n\n// Check if the input text is within a specific token limit\nconst tokenLimit = 1024\nconst withinLimit = isWithinTokenLimit(inputText, tokenLimit)\nconsole.log(`Is within token limit: ${withinLimit}`)\n```\n\n## API\n\n### `approximateTokenSize`\n\nEstimates the number of tokens in a given input string based on common English patterns and tokenization heuristics. Work well for other languages too, like German.\n\n**Usage:**\n\n```ts\nconst estimatedTokens = approximateTokenSize('Hello, world!')\n```\n\n**Type Declaration:**\n\n```ts\nfunction approximateTokenSize(input: string): number\n```\n\n### `approximateMaxTokenSize`\n\nCalculates the maximum number of tokens that can be included in a response given the prompt length and model's maximum context size.\n\n**Usage:**\n\n```ts\nconst maxTokens = approximateMaxTokenSize({\n  prompt: 'Sample prompt',\n  modelName: 'text-davinci-003',\n  maxTokensInResponse: 500\n})\n```\n\n**Type Declaration:**\n\n```ts\nfunction approximateMaxTokenSize({ prompt, modelName, maxTokensInResponse }: {\n  prompt: string\n  modelName: ModelName\n  /** The maximum number of tokens to generate in the reply. 1000 tokens are roughly 750 English words. */\n  maxTokensInResponse?: number\n}): number\n```\n\n### `isWithinTokenLimit`\n\nChecks if the estimated token count of the input is within a specified token limit.\n\n**Usage:**\n\n```ts\nconst withinLimit = isWithinTokenLimit('Check this text against a limit', 100)\n```\n\n**Type Declaration:**\n\n```ts\nfunction isWithinTokenLimit(input: string, tokenLimit: number): boolean\n```\n\n## License\n\n[MIT](./LICENSE) License © 2023-PRESENT [Johann Schopplich](https://github.com/johannschopplich)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjohannschopplich%2Ftokenx","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjohannschopplich%2Ftokenx","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjohannschopplich%2Ftokenx/lists"}