Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/josephrocca/gpt-2-3-tokenizer
GPT-2/3 tokenizer based on @latitudegames/GPT-3-Encoder that works in the browser and Deno
https://github.com/josephrocca/gpt-2-3-tokenizer
Last synced: 3 months ago
JSON representation
GPT-2/3 tokenizer based on @latitudegames/GPT-3-Encoder that works in the browser and Deno
- Host: GitHub
- URL: https://github.com/josephrocca/gpt-2-3-tokenizer
- Owner: josephrocca
- Created: 2021-04-22T11:37:11.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-10-20T13:36:55.000Z (about 1 year ago)
- Last Synced: 2024-04-18T14:11:46.145Z (7 months ago)
- Language: JavaScript
- Size: 611 KB
- Stars: 31
- Watchers: 4
- Forks: 5
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
**Note**: This repo is for GPT-3 only. Here's some code that should work for gpt-3.5-turbo and gpt-4:
```js
import { AutoTokenizer } from 'https://cdn.jsdelivr.net/npm/@xenova/[email protected]'
let tokenizer = await AutoTokenizer.from_pretrained("Xenova/gpt-4"); // gpt-3.5-turbo uses same tokenizer as gpt-4 IIUC
let tokens = tokenizer.encode("hello world");
```---
# GPT-2/3 Tokenizer
GPT-2/3 byte pair encoder/decoder/tokenizer based on [@latitudegames/GPT-3-Encoder](https://github.com/latitudegames/GPT-3-Encoder) that works in the browser and Deno.
See also: [JS byte pair encoder for OpenAI's CLIP model](https://github.com/josephrocca/clip-bpe-js).
```js
import {encode, decode} from "https://deno.land/x/[email protected]/mod.js";
let text = "hello world";
console.log(encode(text)); // [258, 18798, 995]
console.log(decode(encode(text))); // "hello world"
```
or:
```js
let mod = await import("https://deno.land/x/[email protected]/mod.js");
mod.encode("hello world"); // [258, 18798, 995]
```
or to include it as a global variable (as if you were importing it with the old script tag style):
```htmlimport tokenizer from "https://deno.land/x/[email protected]/mod.js";
window.tokenizer = tokenizer;```
# License
The [original code is MIT Licensed](https://github.com/latitudegames/GPT-3-Encoder/blob/master/LICENSE) and so are any changes made by this repo.