https://github.com/botisan-ai/gpt3-tokenizer

Isomorphic JavaScript/TypeScript Tokenizer for GPT-3 and Codex Models by OpenAI.
https://github.com/botisan-ai/gpt3-tokenizer

chatgpt codex gpt-3 gpt3 javascript nodejs openai tokenizer typescript

Last synced: about 9 hours ago
JSON representation

Isomorphic JavaScript/TypeScript Tokenizer for GPT-3 and Codex Models by OpenAI.

Host: GitHub
URL: https://github.com/botisan-ai/gpt3-tokenizer
Owner: botisan-ai
License: mit
Created: 2021-12-27T07:59:18.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2023-01-27T04:07:55.000Z (over 3 years ago)
Last Synced: 2026-04-02T18:26:11.564Z (4 months ago)
Topics: chatgpt, codex, gpt-3, gpt3, javascript, nodejs, openai, tokenizer, typescript
Language: TypeScript
Homepage:
Size: 2.06 MB
Stars: 172
Watchers: 7
Forks: 16
Open Issues: 3
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE

Awesome Lists containing this project

README

          # GPT3 Tokenizer

[![Build](https://github.com/botisan-ai/gpt3-tokenizer/actions/workflows/main.yml/badge.svg)](https://github.com/botisan-ai/gpt3-tokenizer/actions/workflows/main.yml)

[![NPM Version](https://img.shields.io/npm/v/gpt3-tokenizer.svg)](https://www.npmjs.com/package/gpt3-tokenizer)

[![NPM Downloads](https://img.shields.io/npm/dt/gpt3-tokenizer.svg)](https://www.npmjs.com/package/gpt3-tokenizer)

This is a isomorphic TypeScript tokenizer for OpenAI's GPT-3 model. Including support for `gpt3` and `codex` tokenization. It should work in both NodeJS and Browser environments.

## Usage

First, install:

```shell

yarn add gpt3-tokenizer

```

In code:

```typescript

import GPT3Tokenizer from 'gpt3-tokenizer';

const tokenizer = new GPT3Tokenizer({ type: 'gpt3' }); // or 'codex'

const str = "hello 👋 world 🌍";

const encoded: { bpe: number[]; text: string[] } = tokenizer.encode(str);

const decoded = tokenizer.decode(encoded.bpe);

```

## Reference

This library is based on the following:

- [OpenAI Tokenizer Page Source](https://beta.openai.com/tokenizer?view=bpe)

- [gpt-3-encoder](https://github.com/latitudegames/GPT-3-Encoder)

The main difference between this library and gpt-3-encoder is that this library supports both `gpt3` and `codex` tokenization (The dictionary is taken directly from OpenAI so the tokenization result is on par with the OpenAI Playground). Also Map API is used instead of JavaScript objects, especially the `bpeRanks` object, which should see some performance improvement.

## License

[MIT](./LICENSE)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/botisan-ai/gpt3-tokenizer

Awesome Lists containing this project

README