https://github.com/johannschopplich/tokenx
π GPT token estimation and context size utilities without a full tokenizer
https://github.com/johannschopplich/tokenx
tiktoken token-counter tokenization tokenizer
Last synced: 10 months ago
JSON representation
π GPT token estimation and context size utilities without a full tokenizer
- Host: GitHub
- URL: https://github.com/johannschopplich/tokenx
- Owner: johannschopplich
- License: mit
- Created: 2023-11-27T10:24:11.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-11-29T11:43:09.000Z (about 1 year ago)
- Last Synced: 2025-04-26T23:44:34.559Z (10 months ago)
- Topics: tiktoken, token-counter, tokenization, tokenizer
- Language: TypeScript
- Homepage:
- Size: 353 KB
- Stars: 21
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# tokenx
GPT token count and context size utilities when approximations are good enough. For advanced use cases, please use a full tokenizer like [`gpt-tokenizer`](https://github.com/niieani/gpt-tokenizer). This library is intended to be used for quick estimations and to avoid the overhead of a full tokenizer, e.g. when you want to limit your bundle size.
## Benchmarks
The following table shows the accuracy of the token count approximation for different input texts:
| Description | Actual GPT Token Count | Estimated Token Count | Token Count Deviation |
| --- | --- | --- | --- |
| Short English text | 10 | 11 | 10.00% |
| German text with umlauts | 56 | 49 | 12.50% |
| Metamorphosis by Franz Kafka (English) | 31892 | 33930 | 6.39% |
| Die Verwandlung by Franz Kafka (German) | 40621 | 34908 | 14.06% |
| ιεΎ·ηΆ by Laozi (Chinese) | 14387 | 11919 | 17.15% |
| TypeScript ES5 Type Declarations (~ 4000 loc) | 48408 | 51688 | 6.78% |
## Features
- π Estimate token count without a full tokenizer
- π Supports multiple model context sizes
- π£οΈ Supports accented characters, like German umlauts or French accents
- πͺ½ Zero dependencies
## Installation
Run the following command to add `tokenx` to your project.
```bash
# npm
npm install tokenx
# pnpm
pnpm add tokenx
# yarn
yarn add tokenx
```
## Usage
```ts
import {
approximateMaxTokenSize,
approximateTokenSize,
isWithinTokenLimit
} from 'tokenx'
const prompt = 'Your prompt goes here.'
const inputText = 'Your text goes here.'
// Estimate the number of tokens in the input text
const estimatedTokens = approximateTokenSize(inputText)
console.log(`Estimated token count: ${estimatedTokens}`)
// Calculate the maximum number of tokens allowed for a given model
const modelName = 'gpt-3.5-turbo'
const maxResponseTokens = 1000
const availableTokens = approximateMaxTokenSize({
prompt,
modelName,
maxTokensInResponse: maxResponseTokens
})
console.log(`Available tokens for model ${modelName}: ${availableTokens}`)
// Check if the input text is within a specific token limit
const tokenLimit = 1024
const withinLimit = isWithinTokenLimit(inputText, tokenLimit)
console.log(`Is within token limit: ${withinLimit}`)
```
## API
### `approximateTokenSize`
Estimates the number of tokens in a given input string based on common English patterns and tokenization heuristics. Work well for other languages too, like German.
**Usage:**
```ts
const estimatedTokens = approximateTokenSize('Hello, world!')
```
**Type Declaration:**
```ts
function approximateTokenSize(input: string): number
```
### `approximateMaxTokenSize`
Calculates the maximum number of tokens that can be included in a response given the prompt length and model's maximum context size.
**Usage:**
```ts
const maxTokens = approximateMaxTokenSize({
prompt: 'Sample prompt',
modelName: 'text-davinci-003',
maxTokensInResponse: 500
})
```
**Type Declaration:**
```ts
function approximateMaxTokenSize({ prompt, modelName, maxTokensInResponse }: {
prompt: string
modelName: ModelName
/** The maximum number of tokens to generate in the reply. 1000 tokens are roughly 750 English words. */
maxTokensInResponse?: number
}): number
```
### `isWithinTokenLimit`
Checks if the estimated token count of the input is within a specified token limit.
**Usage:**
```ts
const withinLimit = isWithinTokenLimit('Check this text against a limit', 100)
```
**Type Declaration:**
```ts
function isWithinTokenLimit(input: string, tokenLimit: number): boolean
```
## License
[MIT](./LICENSE) License Β© 2023-PRESENT [Johann Schopplich](https://github.com/johannschopplich)