{"id":27958476,"url":"https://github.com/av/klmbr","last_synced_at":"2025-08-23T10:09:17.948Z","repository":{"id":287965722,"uuid":"859859604","full_name":"av/klmbr","owner":"av","description":"klmbr - a prompt pre-processing technique to break through the barrier of entropy while generating text with LLMs","archived":false,"fork":false,"pushed_at":"2024-09-22T16:32:27.000Z","size":2346,"stargazers_count":80,"open_issues_count":0,"forks_count":3,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-08-17T04:29:16.432Z","etag":null,"topics":["inference","llm","prompts","tokenization"],"latest_commit_sha":null,"homepage":"","language":"TeX","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/av.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-09-19T12:05:44.000Z","updated_at":"2025-08-06T21:40:17.000Z","dependencies_parsed_at":null,"dependency_job_id":"1910ea5d-cb69-4890-87fe-b550c23ac766","html_url":"https://github.com/av/klmbr","commit_stats":null,"previous_names":["av/klmbr"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/av/klmbr","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/av%2Fklmbr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/av%2Fklmbr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/av%2Fklmbr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/av%2Fklmbr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/av","download_url":"https://codeload.github.com/av/klmbr/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/av%2Fklmbr/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271746549,"owners_count":24813570,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-23T02:00:09.327Z","response_time":69,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["inference","llm","prompts","tokenization"],"created_at":"2025-05-07T18:24:31.770Z","updated_at":"2025-08-23T10:09:17.939Z","avatar_url":"https://github.com/av.png","language":"TeX","funding_links":[],"categories":[],"sub_categories":[],"readme":"# klmbr\n\nklmbr - a prompt pre-processing technique to induce retokenization of the input for the LLMs.\n\nhttps://github.com/user-attachments/assets/5141c554-38aa-4615-a9e5-7be9fe53c81b\n\n---\n\n### Intro\n\nLet's imagine that the LLM was trained on this sentence (and most likely, it was):\n\n```text\nThe sky is blue\n```\n\nWe can almost safely assume that most modern tokenizers would translate these inputs as following:\n\n![screenshot of the \"The sky is blue\" tokenization](./assets/sky-tokens.png)\u003cbr/\u003e\n\u003csub\u003eHere and below, token screenshots courtesy of https://tokenvisualizer.netlify.app/\u003c/sub\u003e\n\nBut what if we'll tokenize it differently? What will be the impact on infrerence?\n\n![Different tokenization of the same phrase](./assets/sky-tokens2.png)\n![Different tokenization of the same phrase](./assets/sky-tokens3.png)\n\nWill the model stop being able to produce coherent predictions for such sequences?\n\nAs it happens, _most of the LLMs are trained_ for cases like this one. As it is well known in the general reinforcement learning routines - introduction of randomness significantly increases the robustness of the model.\n\nMost mainstream LLMs will handle these sample sequences with ease, producing the exactly desired output.\n\nLet's take a look at another example, though:\n\n![Tokenization of a \"What goes up but never comes down again?\" riddle](./assets/riddle-tokens.png)\n\nThis is an extremely famous riddle. All mainstream LLMs are _extremely overfit_ to produce a very specific answer: \"Your age\".\n\nWhat if we'll alter the riddle very slightly, but significantly enough to change the answer?\n\n![Tokenization of a \"What goes up but never comes up again?\" riddle](./assets/riddle-tokens2.png)\n\nHow quickly did you spot the difference? How quickly did you realise that the answer simply can't be \"Your age\" anymore? Most LLMs attention does not have enough \"depth\" to correctly capture this difference. Surely enough, their answer to this updated riddle is the same.\n\n![Example answer to the riddle from Llama 3.1 405B](./assets/riddle-405B.png)\n\nIf Llama 3.1 405B can't figure it out - most of the smaller LLMs won't either, right?\n\nSure enough.\n\n![Example asnwer from a few small LLMs: Mistral Small, Llama 3.1 8B, Gemma 2B](./assets/riddle-small-llms.png)\n\n\u003e [!TIP]\n\u003e You can find a whole collection of such \"overfit\" inputs in the awesome [MisguidedAttention](https://github.com/cpldcpu/MisguidedAttention) repo\n\n### Induced retokenization\n\nLet's explore what happens when these LLMs see a slightly adjusted input sequence? Most of them were trained for such variance. Just out of the fact that LLM is capable to interpret such inputs we can conclude that it was trained on such inputs. \n\nMaybe it was one the portions of the secret data recipes nobody wants to disclose. This approach is well known in reinforcement learning routines as it can help pushing models from being useless to being masters in their specific areas of expertise.\n\nWho said that this is not applicable to language?\n\n```l33tspeak\nI'm tall when I'm young, and I'm talleR when I'm old. What́ am I?\nI'm tall when Ĩ'm young, and i'm taller when I'm 0ld. What am I?\nI'm tall when I'm young, and 1'm tAller wHeN I'm̊ old. What̋ am Ĩ?\nI'm ta1l whEn I'm young, aNd İ'm t4ller when̊ I'm l̊d. wHa7 Am I?\n'm 7All whn I'm̈ Yòng, ND İ'm taller when̄ i'm̆ ol̊d. What 4m̃ ?\nÏ'm tal̃l̇ whén I'm yoUnG, 4nd 'm talLEr w̄HEn I'm 0Ld. What am I?\nÌ'M TLl whN Í'm yOunġ, ànd 'm t4L1e̋r when i'ṁ ld. W̋h́At 4m 1?\n1'm tAl̊L whe̋N Ï'm̀ ŷoũnG, nD i'm̈ 7llêr Whn̆ 1'm 0lD. whA7 Aḿ I?\nI'm 7ALl̇ wh3n i'm yóUn9, nd Ì'm̆ 7ál1Er wh́En̈ i'M ōl̆d́. WH4t m I?\ni'm̆ TálL w̃ḧn Ī'm yOun9, 4ǹd i'm̈ 7à1l̈ër whën̆ Î'M l̀d. wH4T m̄ İ?\n```\n\nWhen do we reach the boundary of can recognize from its training. First levels cause absolutely zero problems.\n\n![L3.1 8B answers about candle](./assets/l38b-candle.png)\n\nBut when we reach the bottom, the \"soup\" is surely nothing like the model could've seen during its training (presumably)\n\n![tokenization of inpronouncible last level](./assets/c4ndl3.png)\n\nHow much of this still triggers same patters when multiplying QKV with embeddings?\n\n\n### Technique overview\n\n`klmbr` (from \"Kalambur\", but you can pronounce it as \"climber\") is a (very naive and simple) technique for inducing alternative tokenization for the LLM inputs. Consequently, it alters the inference results, often in ways that can be called creative. \n\nIt works by randomly replacing a given percentage of the input with... things that are similar but not quite. Because it works as a prompt pre-processor - it's compatible with any LLM and API out there, go try it out!\n\n#### What it does?\n\nLLMs could be overfit on specific token sequences, so breaking them down into something else - will produce a different output. Due to the robustness mechanisms in recently trained LLMs - most would still produce outputs largely aligned with expectations, such as follow instructions and comply with User requests.\n\nLLMs have different training recipes, so will respond to alterations in different ways. `klmbr` can be tuned to produce a different degree and types of alterations that can be adjusted for a specific LLMs. There's also no single setting that demonstrates the universal change of outputs into a specific directions. \n\n#### What are the results?\n\nMost obvious:\n- Randomised outputs, as inputs are randomised on every pass\n- Less impact from specific overfit sequences\n- Grammatic mistakes\n\nLess obvious:\n- Certain LLMs reason about obscure topics better\n- May invoke guardrails unexpectedly\n- May remove guardrails unexpectedly\n\nSorry for not providing any more solid evidence or the benchmarks. Hopefully, I'll find a way to do that.\n\n### Examples\n\n#### [Open WebUI](./examples/klmbr-webui.py)\n\n![alt text](./assets/klmbr-webui.png)\n\nA sample [Function](https://docs.openwebui.com/tutorial/functions#what-are-functions) with `klmbr` technique, and input/output filtering. Will add `klmbr` versions of all available Ollama models. \n\nAs for now, you can only tweak retokenization params via code.\n\n#### [Python](./klmbr/main.py)\n\nThe technique is very easy to implement, so you likely won't need any library to do so. However, the repo still contains a sample implementation that also tracks the adjusted symbols (to mask retokenized LLM outputs, for example) and implements basic support for the punctuation.\n\n#### [Harbor Boost](https://github.com/av/harbor/wiki/5.2.-Harbor-Boost#klmbr---boost-llm-creativity)\n\n`klmbr` is implemented as one of the modules in Harbor's [Boost service](https://github.com/av/harbor/wiki/5.2.-Harbor-Boost)\n\n```bash\nharbor boost modules add klmbr\nharbor up boost\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fav%2Fklmbr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fav%2Fklmbr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fav%2Fklmbr/lists"}