An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with tiktoken

A curated list of projects in awesome lists tagged with tiktoken .

https://github.com/dqbd/tiktokenizer

Online playground for OpenAPI tokenizers

chatgpt nextjs openai t3-stack tiktoken tokenizer

Last synced: 15 May 2025

https://github.com/jimmc414/onefilellm

Specify a github or local repo, github pull request, arXiv or Sci-Hub paper, Youtube transcript or documentation URL on the web and scrape into a text file and clipboard for easier LLM ingestion

arxiv doi github ipynb llm papers pdf pmid pull-request repository sci-hub text tiktoken youtube-transcript-api

Last synced: 15 May 2025

https://github.com/pkoukk/tiktoken-go

go version of tiktoken

chatgpt go golang gpt-35-turbo gpt-4 openai tiktoken

Last synced: 18 Jan 2026

https://github.com/cnseniorious000/free-chat

An elegant LLM chat UI forked from chatgpt-demo of @anse-app. Index site at https://free-chat.asia

aibot astro chatgpt openai openai-api solidjs tiktoken

Last synced: 04 Apr 2025

https://github.com/tryagi/tiktoken

High-performance .NET BPE tokenizer — up to 618 MiB/s, competitive with Rust. Zero-allocation counting, multilingual cache, o200k/cl100k/r50k/p50k encodings + HuggingFace tokenizer.json support.

ai bpe cl100k-base csharp dotnet gpt4o high-performance huggingface o200k-base openai sdk tiktoken tokenizer zero-allocation

Last synced: 01 Apr 2026

https://github.com/tryAGI/Tiktoken

This project implements token calculation for OpenAI's gpt-4 and gpt-3.5-turbo model, specifically using `cl100k_base` encoding.

ai chatgpt cl100kbase csharp encoding gpt35turbo gpt4 langchain langchain-dotnet openai p50kbase tiktoken tiktoken-sharp tokens

Last synced: 09 Apr 2025

https://github.com/elmiraghorbani/chatgpt-long-term-memory

The ChatGPT Long Term Memory package is a powerful tool designed to empower your projects with the ability to handle a large number of simultaneous users and external sources.

chatbot chatgpt chatgpt-api context datastore embedding-similarity embeddings gpt-3 gpt-35-turbo llama-index long-term-memory memory openai python redis similarity-search text-retrieval text-summarization tiktoken vector

Last synced: 26 Apr 2025

https://github.com/cahya-wirawan/rwkv-tokenizer

A fast RWKV Tokenizer written in Rust

huggingface llm rwkv tiktoken tokenizer trie

Last synced: 09 Apr 2025

https://github.com/chonkie-ai/autotiktokenizer

🧰 The AutoTokenizer that TikToken always needed -- Load any tokenizer with TikToken now! ✨

machine-learning nlp tiktoken tokenizers transformers

Last synced: 12 Apr 2025

https://github.com/aallam/ktoken

Kotlin multiplatform BPE tokenizer library for OpenAI models

binary-p bpe byte-pair-encoding gpt kotlin openai tiktoken tokenizer

Last synced: 26 Aug 2025

https://github.com/johannschopplich/tokenx

📐 GPT token estimation and context size utilities without a full tokenizer

tiktoken token-counter tokenization tokenizer

Last synced: 01 May 2025

https://github.com/sewenew/tokenizer

C++ implementation of tokenizers, including tiktoken.

openai tiktoken tokenizer

Last synced: 14 Apr 2025

https://github.com/hupe1980/go-tiktoken

✂️ OpenAI's tiktoken tokenizer written in Go

bpe golang gpt2 openai tiktoken tokenizer

Last synced: 07 Nov 2025

https://github.com/kojix2/tiktoken-c

C API for tiktoken-rs

bpe c tiktoken tokenizer

Last synced: 07 Oct 2025

https://github.com/jacoblincool/tiktoken-calculator

Calculate the token count for GPT-4, GPT-3.5, GPT-3, and GPT-2.

gpt tiktoken tokenizer

Last synced: 28 Feb 2026

https://github.com/oelmekki/tiktoken-cli

Simple wrapper around tiktoken to use it in your favorite language.

chatgpt chatgpt-api gpt gpt-api openai openai-api tiktoken

Last synced: 30 Oct 2025

https://github.com/kojix2/tiktoken-cr

Tiktoken for Crystalists

crystal tiktoken

Last synced: 10 Mar 2026

https://github.com/schneiderfelipe/chat-splitter

Split chat messages by maximum chat completion token count

ai artificial-intelligence chat chatgpt gpt-4 nlp openai split text tiktoken tokenizer

Last synced: 20 Jun 2025

https://github.com/wonyoung-jang/logseq-tokenizer

Logseq Markdown Tokenizer is a Python application that tokenizes and estimates prices for one to many markdown files.

logseq markdown openai-api tiktoken

Last synced: 07 May 2026

https://github.com/peterheb/gotoken

Gotoken is a pure-Go implementation of the Python library openai/tiktoken.

chatgpt go openai tiktoken

Last synced: 14 Jan 2026

https://github.com/kgruiz/pytokencounter

A simple Python library for tokenizing text and counting tokens. While currently only supporting OpenAI LLMs, it helps with text processing and managing token limits in AI applications.

ai encoding large-language-models llm machine-learning models nlp openai text-processing tiktoken token tokenizer

Last synced: 10 Apr 2025

https://github.com/flexchar/tiktoken-counter

Tiktoken-counter as standalone API

docker openai-tokenizer tiktoken token-counter

Last synced: 16 Jan 2026

https://github.com/gweidart/rs-bpe

A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust

bpe bpe-tokenizer byte-pair-encoding byte-pair-tokenizer huggingface llm openai pypi-package python rust tiktoken tokenizers

Last synced: 28 Apr 2025

https://github.com/maxim-saplin/tiktoken-bench

Comparing OpenAI tokeniser (tiktoken) performance - stock Python/Rust vs JS/WASM

ai chatgpt javascript nlp python tiktoken

Last synced: 10 Apr 2026

https://github.com/p1ayer-1/chatgpt-web-vs-api-pricing

Count tokens to determine cost differential of ChatGPT Plus subscription and ChatGPT API

chatgpt chatgpt-api chatgpt-chat-history chatgpt-history chatgpt-plus chatgpt4 openai tiktoken

Last synced: 18 Feb 2026

https://github.com/maledorak/single-token-words

List of single token words for LLM usage

llm openai tiktoken tokenizer

Last synced: 14 May 2026

https://github.com/guanhui07/tiktoken-php

This is a port of the tiktoken

chatgpt tiktoken tiktoken-php

Last synced: 30 Sep 2025

https://github.com/shivendrra/shredword

Fast & efficient BPE tokenizer written in C & python for LLM tranining

c c-tokenizer cpp open-source tiktoken tokenizer

Last synced: 09 Apr 2025

https://github.com/annnieglez/genai-travel-guide

This project is an AI-powered chatbot that provides real-time travel advice about Iceland. It utilizes Retrieval-Augmented Generation (RAG) by storing document embeddings in ChromaDB and retrieving relevant information to generate responses using a Large Language Model (LLM).

beautifulsoup chromadb embeddings langchain llm llms matplotlib openai pandas rag reportlab selenium tiktoken

Last synced: 14 Apr 2026

https://github.com/sameermanan/rs-bpe

A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust

bpe bpe-tokenizer byte-pair-encoding byte-pair-tokenizer huggingface llm openai pypi-package python rust tiktoken tokenizers

Last synced: 10 May 2026

https://github.com/hvasconcelos/picogpt-mlx

A minimal, from-scratch decoder-only GPT in MLX — trained on Tiny Shakespeare on Apple Silicon. Inspired by nanoGPT.

apple-silicon deep-learning from-scratch gpt language-model llm mlx nanogpt python tiktoken transformer

Last synced: 02 Jun 2026

https://github.com/reshiadavan/thoth

tokenizer for large-scale language models (GPT, Claude, Llama, etc.)

bytepairencoding gpt-2 gpt-4 llama2 natural-language-processing python rust sentencepiece tiktoken tokenizer

Last synced: 01 Mar 2026

https://github.com/anuritigupta26/researchmate-

Research Mate is a web-based application designed to assist researchers, students, and professionals in efficiently processing and extracting insights from research articles and online content. Users can input multiple research URLs, which the app processes and converts into useful information. Powered by OpenAI’s GPT models and LangChain, the app

faiss-cpu groq-api huggingface langchain libmagic llm openai pickle sentence-transformers streamlit tiktoken

Last synced: 16 Apr 2026

https://github.com/farithadnan/datasetforge

Extracts Google Sheets to JSONL for fine-tuning, estimates task costs with tiktoken.

fine-tuning googlesheetsapi openai python3 tiktoken

Last synced: 07 Jul 2025

https://github.com/stefanpietrusky/iec

Repository for the article in the online magazine Level Up Coding

beautifulsoup chatbot flask-application llama ollama python tiktoken

Last synced: 20 May 2026

https://github.com/b0o/tiktoken-bench

A small Node.js benchmark suite for the tiktoken WASM port.

benchmark gpt-3 gpt-4 machine-learning openai tiktoken tokenization

Last synced: 01 May 2026

https://github.com/gcondeh/tokens

Pequeñas utilidades para contar tokens y cortar cadenas de texto

langchain python spacy-nlp spanish tiktoken

Last synced: 21 May 2026

https://github.com/haha-systems/toll

The OpenAI tiktoken library as a service. For counting the number of tokens in a message to an LLM like GPT.

flask llms openai python tiktoken

Last synced: 04 May 2026

https://github.com/jaco-bro/mlx.zig

MLX.zig: Lightweight Zig language bindings for Apple's MLX framework, enabling efficient machine learning directly on Apple Silicon with zero external dependencies.

ai llama llm mlx pcre2 regex tiktoken zig

Last synced: 20 May 2026

https://github.com/developedby-siva/token-scope

Profile your LLM payloads. Find the waste. Cut the cost. Field-level token attribution, cost leak detection, and payload optimization for any LLM API.

docker fastapi github-actions llm openai tiktoken

Last synced: 08 Apr 2026

https://github.com/jsleekr/skilldigest

Static analyzer for AI agent skill libraries. Finds dead/bloated/conflicting skills, counts tokens, gates CI. Single Rust binary.

ai-agents claude-code cli codex cursor rust sarif skill-library static-analysis tiktoken

Last synced: 26 May 2026

https://github.com/madhurajayashanka/ai-travel-assistant-langchain

AI Travel Assistant uses Python, OpenAI API, Streamlit, SQLite & LangChain to generate smart, personalized travel itineraries.

agents ai chatbot langchain nlp openai openai-api sqlite streamlit tiktoken

Last synced: 11 Apr 2026

https://github.com/ziffan/chunklab

ChunkLab is a powerful browser-based sandbox designed for developers to test, visualize, and validate text chunking pipeline configurations. Optimize your RAG (Retrieval-Augmented Generation) ingestion process with real-time feedback and detailed metrics.

ai chunking data-preprocessing developer-tools embeddings fastapi llm nlp playground python rag react regex sandbox text-processing tiktoken tokenization vector-database

Last synced: 02 May 2026

https://github.com/stefanpietrusky/iecv1.5

Repository for the article in the online magazine Level Up Coding

beautifulsoup chatbot flask-application llama ollama python tiktoken

Last synced: 12 Apr 2025

https://github.com/dbtreasure/zig-bpe

Byte Pair Encoding (BPE) in the Zig programming language (0.13.0)

bytepairencoding tiktoken zig

Last synced: 20 Jan 2026

https://github.com/403errors/tubequery

TubeQuery is a LLM based model, fetching all the queries related to your video. Just input the video link and all the qestiones are welcomed!

huggingface-transformers langchain nlp-machine-learning pipeline python3 tiktoken whisper yt-dlp

Last synced: 12 Apr 2026