https://github.com/dluc/openai-tools
A collection of tools for working with OpenAI
https://github.com/dluc/openai-tools
gpt-3 gpt3 openai tokenization tokenizer
Last synced: about 2 months ago
JSON representation
A collection of tools for working with OpenAI
- Host: GitHub
- URL: https://github.com/dluc/openai-tools
- Owner: dluc
- License: cc0-1.0
- Created: 2022-10-17T16:10:13.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-05-04T10:33:35.000Z (about 2 years ago)
- Last Synced: 2025-03-24T00:49:29.707Z (3 months ago)
- Topics: gpt-3, gpt3, openai, tokenization, tokenizer
- Language: C#
- Homepage:
- Size: 559 KB
- Stars: 98
- Watchers: 8
- Forks: 15
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
# GPT Tokenizer
## .NET / C#
When using
[OpenAI GPT](https://openai.com/blog/gpt-3-apps/),
you may need to know how many
[tokens](https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them)
your code is using for various purposes, such as estimating costs and improving
results.The `GPT3Tokenizer` C# class can help you **count tokens** in your prompts and
in the responses received.```csharp
using AI.Dev.OpenAI.GPT;string text = "January 1st, 2000";
// 5 tokens => [21339, 352, 301, 11, 4751]
List tokens = GPT3Tokenizer.Encode(text);
```The tokenizer uses a byte-pair encoding (BPE) algorithm to split words into
subwords based on frequency and merges rules. It can handle out-of-vocabulary
words, punctuation, and special tokens.The result of this library is compatible with OpenAI GPT tokenizer that you can
also test
[here](https://beta.openai.com/tokenizer).### Installation
Install [AI.Dev.OpenAI.GPT](https://www.nuget.org/packages/AI.Dev.OpenAI.GPT) NuGet package from nuget.org, e.g.:
dotnet add package AI.Dev.OpenAI.GPT --version 1.0.2
or
NuGet\Install-Package AI.Dev.OpenAI.GPT -Version 1.0.2
## Python and Node.js
If you are looking for an equivalent solution in other languages:
* [Python GPT tokenizer](https://huggingface.co/docs/transformers/model_doc/gpt2#transformers.GPT2Tokenizer)
* [Node.js GPT encoder](https://www.npmjs.com/package/gpt-3-encoder)# Licensing
This library is licensed CC0, in the public domain. You can use it for any
application, you can modify the code, and you can redistribute any part of it.I am not affiliated with OpenAI and this library is not endorsed by them. I just
work with several AI solutions and I share this code hoping to make technology
more accessible and easier to work with.