{"id":50372918,"url":"https://github.com/lancekrogers/tcount","last_synced_at":"2026-05-30T08:03:25.915Z","repository":{"id":338884732,"uuid":"1144653452","full_name":"lancekrogers/tcount","owner":"lancekrogers","description":"Count tokens of files and directories","archived":false,"fork":false,"pushed_at":"2026-04-19T06:48:28.000Z","size":3001,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-19T08:33:39.113Z","etag":null,"topics":["ai-tools","counter","developer-tools","llms","token-optimization","tokens"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lancekrogers.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":"lancekrogers"}},"created_at":"2026-01-28T22:32:12.000Z","updated_at":"2026-04-19T06:48:52.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/lancekrogers/tcount","commit_stats":null,"previous_names":["lancekrogers/go-token-counter","lancekrogers/tcount"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/lancekrogers/tcount","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lancekrogers%2Ftcount","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lancekrogers%2Ftcount/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lancekrogers%2Ftcount/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lancekrogers%2Ftcount/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lancekrogers","download_url":"https://codeload.github.com/lancekrogers/tcount/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lancekrogers%2Ftcount/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33684419,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-30T02:00:06.278Z","response_time":92,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-tools","counter","developer-tools","llms","token-optimization","tokens"],"created_at":"2026-05-30T08:03:22.806Z","updated_at":"2026-05-30T08:03:25.892Z","avatar_url":"https://github.com/lancekrogers.png","language":"Go","funding_links":["https://github.com/sponsors/lancekrogers"],"categories":[],"sub_categories":[],"readme":"# tcount\n\nA fast, zero-network token counter for LLM workflows. Count tokens in files and directories using exact OpenAI tokenizers, Claude approximations, SentencePiece vocabularies, and generic estimation — all from a single CLI.\n\n## Features\n\n- **Exact BPE tokenization** — offline, no network calls. Supports GPT-5, GPT-4.1, GPT-4o, o-series, and legacy GPT-4/3.5.\n- **Claude approximation** calibrated for Anthropic models\n- **SentencePiece** exact tokenization for Llama and other open-source models (bring your own `.model` file)\n- **Context window usage** — see what percentage of a model's context you're consuming\n- **Cost estimates** with per-1M-token pricing via `--cost`\n- **Provider filtering** — compare models from a specific provider\n- **Directory scanning** with `.gitignore` support and binary file detection\n- **JSON output** for scripting and pipelines\n\n## Install\n\n### npm / pnpm / bun (macOS \u0026 Linux)\n\n```bash\nnpm install -g @obedience-corp/tcount\n# or\npnpm add -g @obedience-corp/tcount\n# or\nbun add -g @obedience-corp/tcount\n```\n\nThe npm package downloads the official release binary for your platform (with checksum verification) on first install.\n\n### Homebrew (macOS \u0026 Linux)\n\n```bash\nbrew install lancekrogers/tap/tcount\n```\n\n### Go\n\n```bash\ngo install github.com/lancekrogers/tcount/cmd/tcount@latest\n```\n\n### From source\n\n```bash\ngit clone https://github.com/lancekrogers/tcount.git\ncd tcount\ngo build -o bin/tcount ./cmd/tcount\n```\n\n### Binary releases\n\nPre-built binaries for macOS, Linux, and Windows are available on the [releases page](https://github.com/lancekrogers/tcount/releases).\n\n## Quick Start\n\n```bash\n# Count tokens in a file\ntcount myfile.txt\n\n# Specific model\ntcount --model gpt-5 prompt.md\n\n# All methods with cost estimates\ntcount --all --cost prompt.md\n\n# Filter by provider\ntcount --provider openai prompt.md\n\n# Recursive directory count\ntcount -r ./src\n\n# JSON output for scripting\ntcount --json document.md\n```\n\n## Supported Models\n\n### OpenAI\n| Model | Encoding | Context |\n|-------|----------|---------|\n| `gpt-5`, `gpt-5-mini`, `gpt-5-nano` | o200k_base | 400K |\n| `gpt-5.1`, `gpt-5.2` | o200k_base | 400K |\n| `gpt-4.1`, `gpt-4.1-mini`, `gpt-4.1-nano` | o200k_base | 1M |\n| `gpt-4o`, `gpt-4o-mini` | o200k_base | 128K |\n| `o3`, `o3-mini`, `o4-mini` | o200k_base | 200K |\n| `gpt-4`, `gpt-4-turbo` | cl100k_base | 8K–128K |\n| `gpt-3.5-turbo` | cl100k_base | 16K |\n\n### Anthropic\n| Model | Method | Context |\n|-------|--------|---------|\n| `claude-opus-4.6`, `claude-opus-4.5` | Approximation | 200K |\n| `claude-opus-4.1`, `claude-opus-4` | Approximation | 200K |\n| `claude-sonnet-4.6`, `claude-sonnet-4.5`, `claude-sonnet-4` | Approximation | 200K |\n| `claude-haiku-4.5`, `claude-haiku-3.5`, `claude-haiku-3` | Approximation | 200K |\n| `claude-opus-3` (deprecated) | Approximation | 200K |\n\n### Meta (Llama)\n| Model | Method | Context |\n|-------|--------|---------|\n| `llama-4-scout`, `llama-4-maverick` | tiktoken approx / SentencePiece | 128K |\n| `llama-3.1-8b`, `llama-3.1-70b`, `llama-3.1-405b` | tiktoken approx / SentencePiece | 128K |\n\n### DeepSeek\n| Model | Method | Context |\n|-------|--------|---------|\n| `deepseek-v2`, `deepseek-v3`, `deepseek-coder-v2` | tiktoken approx | 128K |\n\n### Alibaba (Qwen)\n| Model | Method | Context |\n|-------|--------|---------|\n| `qwen-2.5-7b`, `qwen-2.5-14b`, `qwen-2.5-72b` | tiktoken approx | 32K |\n| `qwen-3-72b` | tiktoken approx | 32K |\n\n### Microsoft (Phi)\n| Model | Method | Context |\n|-------|--------|---------|\n| `phi-3-mini`, `phi-3-small`, `phi-3-medium` | tiktoken approx | 128K |\n\n## Tokenization Methods\n\n| Method | Accuracy | When Used |\n|--------|----------|-----------|\n| tiktoken (o200k_base) | Exact | GPT-5.x, GPT-4.1, GPT-4o, o3, o4-mini |\n| tiktoken (cl100k_base) | Exact | GPT-4, GPT-3.5 |\n| Claude approximation | Estimated | All Claude models (÷3.8 char ratio) |\n| SentencePiece | Exact | Llama with `--vocab-file` |\n| tiktoken approximation | Approximate | Llama, DeepSeek, Qwen, Phi (no vocab file) |\n| Character-based | Approximate | Any (chars ÷ configurable ratio, default 4.0) |\n| Word-based | Approximate | Any (words × configurable multiplier, default 1.33) |\n| Whitespace split | Approximate | Any (raw word count as lower bound) |\n\n## Usage\n\n```\ntcount [file|directory] [flags]\n```\n\n### Flags\n\n| Flag | Short | Description |\n|------|-------|-------------|\n| `--model` | | Specific model tokenizer |\n| `--models` | `-m` | Show encoding-to-model lookup table |\n| `--provider` | | Filter by provider: `openai`, `anthropic`, `meta`, `deepseek`, `alibaba`, `microsoft`, `all` |\n| `--vocab-file` | | Path to SentencePiece `.model` file for exact Llama tokenization |\n| `--all` | | Show all counting methods |\n| `--json` | | JSON output |\n| `--cost` | | Include cost estimates (per 1M tokens) |\n| `--recursive` | `-r` | Recursively count files in a directory |\n| `--directory` | `-d` | Alias for `--recursive` |\n| `--chars-per-token` | | Character/token ratio for approximation (default: 4.0) |\n| `--words-per-token` | | Words/token ratio for approximation (default: 0.75) |\n| `--verbose` | | Show additional details |\n| `--no-color` | | Disable color output |\n\n## Examples\n\n### Single model\n\n```\n$ tcount --model gpt-5 document.md\n\nToken Count Report for: document.md\n═══════════════════════════════════════════════════════\n\nBasic Statistics:\n  Characters:     5451\n  Words:          662\n  Lines:          222\n\nToken Counts by Method:\n  ┌─────────────────────────┬──────────┬────────────┬──────────────────┐\n  │ Method                  │ Tokens   │ Accuracy   │ Context Usage    │\n  ├─────────────────────────┼──────────┼────────────┼──────────────────┤\n  │ GPT (gpt-5)             │ 1445     │ Exact      │ 0.7% of 200K     │\n  └─────────────────────────┴──────────┴────────────┴──────────────────┘\n```\n\n### All methods with costs\n\n```\n$ tcount --all --cost document.md\n\nToken Count Report for: document.md\n═══════════════════════════════════════════════════════\n\nBasic Statistics:\n  Characters:     5451\n  Words:          662\n  Lines:          222\n\nToken Counts by Method:\n  ┌─────────────────────────┬──────────┬────────────┬──────────────────┐\n  │ Method                  │ Tokens   │ Accuracy   │ Context Usage    │\n  ├─────────────────────────┼──────────┼────────────┼──────────────────┤\n  │ GPT (gpt-5)             │ 1445     │ Exact      │ 0.7% of 200K     │\n  │ GPT (gpt-4o)            │ 1445     │ Exact      │ 1.1% of 128K     │\n  │ Claude (approx)         │ 1434     │ Estimated  │ 0.7% of 200K     │\n  │ Llama (llama-3.1-8b)    │ 1445     │ Exact      │ 1.1% of 128K     │\n  │ Character-based (÷4.0)  │ 1362     │ Approx     │                  │\n  │ Word-based (×1.33)      │ 882      │ Approx     │                  │\n  │ Whitespace split        │ 662      │ Approx     │                  │\n  └─────────────────────────┴──────────┴────────────┴──────────────────┘\n\nCost Estimates (Input):\n  gpt-5:           $0.0018 ($1.25/1M tokens)\n  gpt-4o:          $0.0036 ($2.50/1M tokens)\n  claude-sonnet-4.6: $0.0043 ($3.00/1M tokens)\n  claude-sonnet-4.5: $0.0043 ($3.00/1M tokens)\n```\n\n### SentencePiece for exact Llama tokenization\n\n```bash\n# Download tokenizer.model from HuggingFace (requires auth):\n# https://huggingface.co/meta-llama/Llama-3.1-8B/blob/main/original/tokenizer.model\n\ntcount --model llama-3.1-8b --vocab-file /path/to/tokenizer.model document.md\n```\n\nWithout `--vocab-file`, Llama models use a tiktoken-based approximation.\n\n### Directory scanning\n\n```\n$ tcount -r --verbose tokenizer/\n\nFound 4 text files (skipped 0 binary, 0 ignored)\nToken Count Report for: tokenizer/ (directory)\n═══════════════════════════════════════════════════════\n\nBasic Statistics:\n  Files:          4\n  Characters:     14929\n  Words:          1906\n  Lines:          612\n\nToken Counts by Method:\n  ┌─────────────────────────┬──────────┬────────────┬──────────────────┐\n  │ Method                  │ Tokens   │ Accuracy   │ Context Usage    │\n  ├─────────────────────────┼──────────┼────────────┼──────────────────┤\n  │ GPT (gpt-5)             │ 4206     │ Exact      │ 2.1% of 200K     │\n  │ Claude (approx)         │ 3928     │ Estimated  │ 2.0% of 200K     │\n  │ Character-based (÷4.0)  │ 3732     │ Approx     │                  │\n  │ Word-based (×1.33)      │ 2541     │ Approx     │                  │\n  │ Whitespace split        │ 1906     │ Approx     │                  │\n  └─────────────────────────┴──────────┴────────────┴──────────────────┘\n```\n\nWhen scanning directories, tcount respects `.gitignore` rules, skips binary files and `.git` directories, and aggregates all text files into a combined count. Use `--verbose` to see file and skip statistics.\n\n### JSON output\n\n```\n$ tcount --json --model gpt-5 document.md\n{\n  \"file_path\": \"document.md\",\n  \"file_size\": 5451,\n  \"characters\": 5451,\n  \"words\": 662,\n  \"lines\": 222,\n  \"methods\": [\n    {\n      \"name\": \"tiktoken_gpt_5\",\n      \"display_name\": \"GPT (gpt-5)\",\n      \"tokens\": 1445,\n      \"is_exact\": true,\n      \"context_window\": 200000\n    }\n  ]\n}\n```\n\n```bash\n# Extract a specific count\ntcount --json myfile.txt | jq '.methods[] | select(.name == \"tiktoken_gpt_5\") | .tokens'\n\n# Batch count all markdown files\nfor f in docs/*.md; do tcount --json \"$f\"; done | jq -s '.'\n```\n\n## Library Usage\n\ntcount can be used as a Go library in your own projects.\n\n### Installation\n\n```bash\ngo get github.com/lancekrogers/tcount/tokenizer\n```\n\n### Basic Token Counting\n\n```go\npackage main\n\nimport (\n    \"context\"\n    \"fmt\"\n    \"log\"\n\n    \"github.com/lancekrogers/tcount/tokenizer\"\n)\n\nfunc main() {\n    counter, err := tokenizer.NewCounter(tokenizer.CounterOptions{})\n    if err != nil {\n        log.Fatal(err)\n    }\n\n    ctx := context.Background()\n    result, err := counter.Count(ctx, \"Hello, world!\", \"gpt-4o\", false)\n    if err != nil {\n        log.Fatal(err)\n    }\n\n    for _, m := range result.Methods {\n        if m.IsExact {\n            fmt.Printf(\"Tokens: %d (exact, %s)\\n\", m.Tokens, m.DisplayName)\n        }\n    }\n}\n```\n\n### File and Directory Counting\n\n```go\nctx := context.Background()\n\n// Count tokens in a single file\nresult, err := counter.CountFile(ctx, \"document.md\", \"gpt-4o\", false)\n\n// Count tokens across a directory (respects .gitignore, skips binaries)\nresult, err := counter.CountDirectory(ctx, \"./src\", \"\", true)\nfmt.Printf(\"Files: %d, Tokens: %d\\n\", result.FileCount, result.Methods[0].Tokens)\n```\n\n### Direct BPE Tokenizer Access\n\n```go\ntok, err := tokenizer.NewBPETokenizer(\"gpt-4o\")\nif err != nil {\n    log.Fatal(err)\n}\n\ncount, _ := tok.CountTokens(\"Hello, world!\")\nfmt.Printf(\"Tokens: %d, Exact: %v\\n\", count, tok.IsExact())\n```\n\n### Model Discovery\n\n```go\n// Get metadata for a specific model\nmeta := tokenizer.GetModelMetadata(\"gpt-4o\")\nfmt.Printf(\"Encoding: %s, Context: %d\\n\", meta.Encoding, meta.ContextWindow)\n\n// List all registered models\nmodels := tokenizer.ListModels()\n\n// List models by provider\nopenaiModels := tokenizer.ListModelsByProvider(tokenizer.ProviderOpenAI)\n```\n\n### Cost Estimation\n\n```go\nctx := context.Background()\nresult, _ := counter.Count(ctx, text, \"gpt-4o\", false)\ncosts := tokenizer.CalculateCosts(result.Methods)\nfor _, c := range costs {\n    fmt.Printf(\"%s: $%.4f\\n\", c.Model, c.Cost)\n}\n```\n\n## Development\n\nRequires [just](https://github.com/casey/just) for the build system.\n\n```bash\njust                       # List all recipes\njust build                 # Build (with fmt + vet)\njust test all              # Run all tests\njust test unit             # Unit tests only\njust test integration      # Integration tests only\njust test coverage         # Coverage report\njust test bench            # Benchmarks\njust release all            # Cross-compile for all platforms\n```\n\n## License\n\nMIT License. See [LICENSE](LICENSE) for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flancekrogers%2Ftcount","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flancekrogers%2Ftcount","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flancekrogers%2Ftcount/lists"}