https://github.com/dluc/openai-tools
A collection of tools for working with OpenAI
https://github.com/dluc/openai-tools
gpt-3 gpt3 openai tokenization tokenizer
Last synced: 10 months ago
JSON representation
A collection of tools for working with OpenAI
- Host: GitHub
- URL: https://github.com/dluc/openai-tools
- Owner: dluc
- License: cc0-1.0
- Created: 2022-10-17T16:10:13.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-05-04T10:33:35.000Z (almost 3 years ago)
- Last Synced: 2025-03-24T00:49:29.707Z (11 months ago)
- Topics: gpt-3, gpt3, openai, tokenization, tokenizer
- Language: C#
- Homepage:
- Size: 559 KB
- Stars: 98
- Watchers: 8
- Forks: 15
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
# GPT Tokenizer
## .NET / C#
When using
[OpenAI GPT](https://openai.com/blog/gpt-3-apps/),
you may need to know how many
[tokens](https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them)
your code is using for various purposes, such as estimating costs and improving
results.
The `GPT3Tokenizer` C# class can help you **count tokens** in your prompts and
in the responses received.
```csharp
using AI.Dev.OpenAI.GPT;
string text = "January 1st, 2000";
// 5 tokens => [21339, 352, 301, 11, 4751]
List tokens = GPT3Tokenizer.Encode(text);
```
The tokenizer uses a byte-pair encoding (BPE) algorithm to split words into
subwords based on frequency and merges rules. It can handle out-of-vocabulary
words, punctuation, and special tokens.
The result of this library is compatible with OpenAI GPT tokenizer that you can
also test
[here](https://beta.openai.com/tokenizer).
### Installation
Install [AI.Dev.OpenAI.GPT](https://www.nuget.org/packages/AI.Dev.OpenAI.GPT) NuGet package from nuget.org, e.g.:
dotnet add package AI.Dev.OpenAI.GPT --version 1.0.2
or
NuGet\Install-Package AI.Dev.OpenAI.GPT -Version 1.0.2
## Python and Node.js
If you are looking for an equivalent solution in other languages:
* [Python GPT tokenizer](https://huggingface.co/docs/transformers/model_doc/gpt2#transformers.GPT2Tokenizer)
* [Node.js GPT encoder](https://www.npmjs.com/package/gpt-3-encoder)
# Licensing
This library is licensed CC0, in the public domain. You can use it for any
application, you can modify the code, and you can redistribute any part of it.
I am not affiliated with OpenAI and this library is not endorsed by them. I just
work with several AI solutions and I share this code hoping to make technology
more accessible and easier to work with.