https://github.com/botisan-ai/gpt3-tokenizer
Isomorphic JavaScript/TypeScript Tokenizer for GPT-3 and Codex Models by OpenAI.
https://github.com/botisan-ai/gpt3-tokenizer
chatgpt codex gpt-3 gpt3 javascript nodejs openai tokenizer typescript
Last synced: 4 months ago
JSON representation
Isomorphic JavaScript/TypeScript Tokenizer for GPT-3 and Codex Models by OpenAI.
- Host: GitHub
- URL: https://github.com/botisan-ai/gpt3-tokenizer
- Owner: botisan-ai
- License: mit
- Created: 2021-12-27T07:59:18.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2023-01-27T04:07:55.000Z (over 3 years ago)
- Last Synced: 2025-03-18T04:52:57.618Z (over 1 year ago)
- Topics: chatgpt, codex, gpt-3, gpt3, javascript, nodejs, openai, tokenizer, typescript
- Language: TypeScript
- Homepage:
- Size: 2.06 MB
- Stars: 171
- Watchers: 7
- Forks: 16
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
# GPT3 Tokenizer
[](https://github.com/botisan-ai/gpt3-tokenizer/actions/workflows/main.yml)
[](https://www.npmjs.com/package/gpt3-tokenizer)
[](https://www.npmjs.com/package/gpt3-tokenizer)
This is a isomorphic TypeScript tokenizer for OpenAI's GPT-3 model. Including support for `gpt3` and `codex` tokenization. It should work in both NodeJS and Browser environments.
## Usage
First, install:
```shell
yarn add gpt3-tokenizer
```
In code:
```typescript
import GPT3Tokenizer from 'gpt3-tokenizer';
const tokenizer = new GPT3Tokenizer({ type: 'gpt3' }); // or 'codex'
const str = "hello 👋 world 🌍";
const encoded: { bpe: number[]; text: string[] } = tokenizer.encode(str);
const decoded = tokenizer.decode(encoded.bpe);
```
## Reference
This library is based on the following:
- [OpenAI Tokenizer Page Source](https://beta.openai.com/tokenizer?view=bpe)
- [gpt-3-encoder](https://github.com/latitudegames/GPT-3-Encoder)
The main difference between this library and gpt-3-encoder is that this library supports both `gpt3` and `codex` tokenization (The dictionary is taken directly from OpenAI so the tokenization result is on par with the OpenAI Playground). Also Map API is used instead of JavaScript objects, especially the `bpeRanks` object, which should see some performance improvement.
## License
[MIT](./LICENSE)