Projects in Awesome Lists tagged with tiktoken
A curated list of projects in awesome lists tagged with tiktoken .
https://github.com/jimmc414/onefilellm
Specify a github or local repo, github pull request, arXiv or Sci-Hub paper, Youtube transcript or documentation URL on the web and scrape into a text file and clipboard for easier LLM ingestion
arxiv doi github ipynb llm papers pdf pmid pull-request repository sci-hub text tiktoken youtube-transcript-api
Last synced: 15 May 2025
https://github.com/pkoukk/tiktoken-go
go version of tiktoken
chatgpt go golang gpt-35-turbo gpt-4 openai tiktoken
Last synced: 18 Jan 2026
https://github.com/cnseniorious000/free-chat
An elegant LLM chat UI forked from chatgpt-demo of @anse-app. Index site at https://free-chat.asia
aibot astro chatgpt openai openai-api solidjs tiktoken
Last synced: 04 Apr 2025
https://github.com/tryagi/tiktoken
High-performance .NET BPE tokenizer — up to 618 MiB/s, competitive with Rust. Zero-allocation counting, multilingual cache, o200k/cl100k/r50k/p50k encodings + HuggingFace tokenizer.json support.
ai bpe cl100k-base csharp dotnet gpt4o high-performance huggingface o200k-base openai sdk tiktoken tokenizer zero-allocation
Last synced: 01 Apr 2026
https://github.com/tryAGI/Tiktoken
This project implements token calculation for OpenAI's gpt-4 and gpt-3.5-turbo model, specifically using `cl100k_base` encoding.
ai chatgpt cl100kbase csharp encoding gpt35turbo gpt4 langchain langchain-dotnet openai p50kbase tiktoken tiktoken-sharp tokens
Last synced: 09 Apr 2025
https://github.com/openshieldai/openshield
OpenShield is a new generation security layer for AI models
ai artificial-intelligence firewall golang guardian llama llm models openai openai-api owasp probllama python security security-tools tiktoken tokenizer
Last synced: 11 Jan 2026
https://github.com/elmiraghorbani/chatgpt-long-term-memory
The ChatGPT Long Term Memory package is a powerful tool designed to empower your projects with the ability to handle a large number of simultaneous users and external sources.
chatbot chatgpt chatgpt-api context datastore embedding-similarity embeddings gpt-3 gpt-35-turbo llama-index long-term-memory memory openai python redis similarity-search text-retrieval text-summarization tiktoken vector
Last synced: 26 Apr 2025
https://github.com/cahya-wirawan/rwkv-tokenizer
A fast RWKV Tokenizer written in Rust
huggingface llm rwkv tiktoken tokenizer trie
Last synced: 09 Apr 2025
https://github.com/chonkie-ai/autotiktokenizer
🧰 The AutoTokenizer that TikToken always needed -- Load any tokenizer with TikToken now! ✨
machine-learning nlp tiktoken tokenizers transformers
Last synced: 12 Apr 2025
https://github.com/aallam/ktoken
Kotlin multiplatform BPE tokenizer library for OpenAI models
binary-p bpe byte-pair-encoding gpt kotlin openai tiktoken tokenizer
Last synced: 26 Aug 2025
https://github.com/johannschopplich/tokenx
📐 GPT token estimation and context size utilities without a full tokenizer
tiktoken token-counter tokenization tokenizer
Last synced: 01 May 2025
https://github.com/sewenew/tokenizer
C++ implementation of tokenizers, including tiktoken.
Last synced: 14 Apr 2025
https://github.com/jacoblincool/tiktoken-calculator
Calculate the token count for GPT-4, GPT-3.5, GPT-3, and GPT-2.
Last synced: 28 Feb 2026
https://github.com/oelmekki/tiktoken-cli
Simple wrapper around tiktoken to use it in your favorite language.
chatgpt chatgpt-api gpt gpt-api openai openai-api tiktoken
Last synced: 30 Oct 2025
https://github.com/wonyoung-jang/logseq-tokenizer
Logseq Markdown Tokenizer is a Python application that tokenizes and estimates prices for one to many markdown files.
logseq markdown openai-api tiktoken
Last synced: 07 May 2026
https://github.com/peterheb/gotoken
Gotoken is a pure-Go implementation of the Python library openai/tiktoken.
Last synced: 14 Jan 2026
https://github.com/kgruiz/pytokencounter
A simple Python library for tokenizing text and counting tokens. While currently only supporting OpenAI LLMs, it helps with text processing and managing token limits in AI applications.
ai encoding large-language-models llm machine-learning models nlp openai text-processing tiktoken token tokenizer
Last synced: 10 Apr 2025
https://github.com/flexchar/tiktoken-counter
Tiktoken-counter as standalone API
docker openai-tokenizer tiktoken token-counter
Last synced: 16 Jan 2026
https://github.com/gweidart/rs-bpe
A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust
bpe bpe-tokenizer byte-pair-encoding byte-pair-tokenizer huggingface llm openai pypi-package python rust tiktoken tokenizers
Last synced: 28 Apr 2025
https://github.com/maxim-saplin/tiktoken-bench
Comparing OpenAI tokeniser (tiktoken) performance - stock Python/Rust vs JS/WASM
ai chatgpt javascript nlp python tiktoken
Last synced: 10 Apr 2026
https://github.com/p1ayer-1/chatgpt-web-vs-api-pricing
Count tokens to determine cost differential of ChatGPT Plus subscription and ChatGPT API
chatgpt chatgpt-api chatgpt-chat-history chatgpt-history chatgpt-plus chatgpt4 openai tiktoken
Last synced: 18 Feb 2026
https://github.com/maledorak/single-token-words
List of single token words for LLM usage
Last synced: 14 May 2026
https://github.com/shivendrra/shredword
Fast & efficient BPE tokenizer written in C & python for LLM tranining
c c-tokenizer cpp open-source tiktoken tokenizer
Last synced: 09 Apr 2025
https://github.com/viniciusmecosta/CountTokensPython
fitz jupyter-notebook nltk python spacy tiktoken token
Last synced: 03 Apr 2025
https://github.com/annnieglez/genai-travel-guide
This project is an AI-powered chatbot that provides real-time travel advice about Iceland. It utilizes Retrieval-Augmented Generation (RAG) by storing document embeddings in ChromaDB and retrieving relevant information to generate responses using a Large Language Model (LLM).
beautifulsoup chromadb embeddings langchain llm llms matplotlib openai pandas rag reportlab selenium tiktoken
Last synced: 14 Apr 2026
https://github.com/phukon/temporal-traverse
console based game based on a llm
cassandra-database datastax langchain large-language-models openai-api tiktoken vector-database
Last synced: 10 Oct 2025
https://github.com/sameermanan/rs-bpe
A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust
bpe bpe-tokenizer byte-pair-encoding byte-pair-tokenizer huggingface llm openai pypi-package python rust tiktoken tokenizers
Last synced: 10 May 2026
https://github.com/akshay-kamath/personal-projects
The projects which made by me while self learning.
ai function-calling generative-ai gpt-3 huggingface langchain machine-learning nlp openai pandas pinecone python retrieval-augmented-generation streamlit tiktoken transformers vector-database vector-search zapier
Last synced: 15 Apr 2026
https://github.com/hvasconcelos/picogpt-mlx
A minimal, from-scratch decoder-only GPT in MLX — trained on Tiny Shakespeare on Apple Silicon. Inspired by nanoGPT.
apple-silicon deep-learning from-scratch gpt language-model llm mlx nanogpt python tiktoken transformer
Last synced: 02 Jun 2026
https://github.com/reshiadavan/thoth
tokenizer for large-scale language models (GPT, Claude, Llama, etc.)
bytepairencoding gpt-2 gpt-4 llama2 natural-language-processing python rust sentencepiece tiktoken tokenizer
Last synced: 01 Mar 2026
https://github.com/anuritigupta26/researchmate-
Research Mate is a web-based application designed to assist researchers, students, and professionals in efficiently processing and extracting insights from research articles and online content. Users can input multiple research URLs, which the app processes and converts into useful information. Powered by OpenAI’s GPT models and LangChain, the app
faiss-cpu groq-api huggingface langchain libmagic llm openai pickle sentence-transformers streamlit tiktoken
Last synced: 16 Apr 2026
https://github.com/farithadnan/datasetforge
Extracts Google Sheets to JSONL for fine-tuning, estimates task costs with tiktoken.
fine-tuning googlesheetsapi openai python3 tiktoken
Last synced: 07 Jul 2025
https://github.com/stefanpietrusky/iec
Repository for the article in the online magazine Level Up Coding
beautifulsoup chatbot flask-application llama ollama python tiktoken
Last synced: 20 May 2026
https://github.com/b0o/tiktoken-bench
A small Node.js benchmark suite for the tiktoken WASM port.
benchmark gpt-3 gpt-4 machine-learning openai tiktoken tokenization
Last synced: 01 May 2026
https://github.com/skitsanos/streamlit-split-text
Text splitting example using Tiktoken
ai chunk chunking chunks genai llm rag streamlit text-split text-splitter text-splitting tiktoken
Last synced: 04 May 2026
https://github.com/viniciusmecosta/counttokenspython
fitz jupyter-notebook nltk python spacy tiktoken token
Last synced: 10 May 2026
https://github.com/developedby-siva/token-scope
Profile your LLM payloads. Find the waste. Cut the cost. Field-level token attribution, cost leak detection, and payload optimization for any LLM API.
docker fastapi github-actions llm openai tiktoken
Last synced: 08 Apr 2026
https://github.com/jsleekr/skilldigest
Static analyzer for AI agent skill libraries. Finds dead/bloated/conflicting skills, counts tokens, gates CI. Single Rust binary.
ai-agents claude-code cli codex cursor rust sarif skill-library static-analysis tiktoken
Last synced: 26 May 2026
https://github.com/ziffan/chunklab
ChunkLab is a powerful browser-based sandbox designed for developers to test, visualize, and validate text chunking pipeline configurations. Optimize your RAG (Retrieval-Augmented Generation) ingestion process with real-time feedback and detailed metrics.
ai chunking data-preprocessing developer-tools embeddings fastapi llm nlp playground python rag react regex sandbox text-processing tiktoken tokenization vector-database
Last synced: 02 May 2026
https://github.com/stefanpietrusky/iecv1.5
Repository for the article in the online magazine Level Up Coding
beautifulsoup chatbot flask-application llama ollama python tiktoken
Last synced: 12 Apr 2025
https://github.com/xp-forge/openai
OpenAI APIs for XP Framework
azure-ai azureai embeddings function-calling load-balancing openai openai-api openai-api-client openai-realtime openai-streaming php7 php8 responses-api rest-api tiktoken tiktoken-php xp-framework
Last synced: 21 Feb 2026
https://github.com/dbtreasure/zig-bpe
Byte Pair Encoding (BPE) in the Zig programming language (0.13.0)
Last synced: 20 Jan 2026
https://github.com/403errors/tubequery
TubeQuery is a LLM based model, fetching all the queries related to your video. Just input the video link and all the qestiones are welcomed!
huggingface-transformers langchain nlp-machine-learning pipeline python3 tiktoken whisper yt-dlp
Last synced: 12 Apr 2026
https://github.com/jvictor011/langchain
langchain openai pypdf python python-docx tiktoken unstructured
Last synced: 12 Apr 2026