{"id":15664730,"url":"https://github.com/aallam/ktoken","last_synced_at":"2025-08-26T14:28:17.808Z","repository":{"id":199126925,"uuid":"700547449","full_name":"aallam/ktoken","owner":"aallam","description":"Kotlin multiplatform BPE tokenizer library for OpenAI models","archived":false,"fork":false,"pushed_at":"2025-01-27T20:46:50.000Z","size":11168,"stargazers_count":32,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-05-05T21:11:43.083Z","etag":null,"topics":["binary-p","bpe","byte-pair-encoding","gpt","kotlin","openai","tiktoken","tokenizer"],"latest_commit_sha":null,"homepage":"","language":"Kotlin","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aallam.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-10-04T19:48:37.000Z","updated_at":"2025-05-04T16:20:06.000Z","dependencies_parsed_at":null,"dependency_job_id":"b8e58cae-4c21-453a-8357-eb9a704070d5","html_url":"https://github.com/aallam/ktoken","commit_stats":null,"previous_names":["aallam/ktoken"],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aallam%2Fktoken","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aallam%2Fktoken/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aallam%2Fktoken/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aallam%2Fktoken/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aallam","download_url":"https://codeload.github.com/aallam/ktoken/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252577020,"owners_count":21770721,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["binary-p","bpe","byte-pair-encoding","gpt","kotlin","openai","tiktoken","tokenizer"],"created_at":"2024-10-03T13:43:59.295Z","updated_at":"2025-05-05T21:11:51.263Z","avatar_url":"https://github.com/aallam.png","language":"Kotlin","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Ktoken\n\n[![Maven Central](https://img.shields.io/maven-central/v/com.aallam.ktoken/ktoken?color=blue\u0026label=Download)](https://central.sonatype.com/namespace/com.aallam.ktoken)\n[![License](https://img.shields.io/github/license/aallam/ktoken?color=yellow)](LICENSE.md)\n[![Documentation](https://img.shields.io/badge/docs-api-a97bff.svg?logo=kotlin)](https://mouaad.aallam.com/ktoken/ktoken)\n\n**Ktoken** is a BPE tokenizer designed for seamless integration with OpenAI's models.\n\n## 📦 Setup\nInstall **Ktoken** by adding the dependency to your `build.gradle` file:\n\n```groovy\nrepositories {\n    mavenCentral()\n}\n\ndependencies {\n    implementation \"com.aallam.ktoken:ktoken:0.4.0\"\n}\n```\n## ⚡️ Getting Started\n\n```kotlin\nval tokenizer = Tokenizer.of(encoding = Encoding.CL100K_BASE)\n// For a specific model in the OpenAI API:\nval tokenizer = Tokenizer.of(model = \"gpt-4\")\n\nval tokens = tokenizer.encode(\"hello world\")\nval text = tokenizer.decode(listOf(15339, 1917))\n```\n\n### ⚙️ Usage Modes\n\nKtoken operates in two modes: Local (default for JVM) and Remote (default for JS/Native).\n\n#### 📍 Local Mode\n\nUtilize `LocalPbeLoader` to retrieve encodings from local files:\n\n```kotlin\nval tokenizer = Tokenizer.of(encoding = Encoding.CL100K_BASE, loader = LocalPbeLoader(FileSystem.SYSTEM))\n// For a specific model in the OpenAI API:\nval tokenizer = Tokenizer.of(model = \"gpt-4\", loader = LocalPbeLoader(FileSystem.SYSTEM))\n```\n\n##### JVM Specifics:\n\nArtifacts for JVM include encoding files. Use `FileSystem.RESOURCES` to load them:\n\n```kotlin\nval tokenizer = Tokenizer.of(encoding = Encoding.CL100K_BASE, loader = LocalPbeLoader(FileSystem.RESOURCES))\n```\n\n*Note: this is the default behavior for JVM.*\n\n#### 🌐 Remote Mode\n\n1. Add Engine: Include one of [Ktor's engines](https://ktor.io/docs/http-client-engines.html) to your dependencies.\n2. Use `RemoteBpeLoader`: To load encoding from remote sources:\n\n```kotlin\nval tokenizer = Tokenizer.of(encoding = Encoding.CL100K_BASE, loader = RemoteBpeLoader())\n\n// For a specific model in the OpenAI API:\nval tokenizer = Tokenizer.of(model = \"gpt-4\", loader = RemoteBpeLoader())\n```\n\n### 📋 BOM Usage\n\nYou might alternatively use [ktoken-bom](/ktoken-bom) by adding the following dependency to your `build.gradle` file:\n\n```groovy\ndependencies {\n    // Import Kotlin API client BOM\n    implementation platform('com.aallam.ktoken:ktoken-bom:0.4.0')\n\n    // Define dependencies without versions\n    implementation 'com.aallam.ktoken:ktoken'\n    runtimeOnly 'io.ktor:ktor-client-okhttp'\n}\n```\n\n### 🔀 Multiplatform Projects\n\nFor multiplatform projects, add the **ktoken** dependency to `commonMain`, and select an [engine](https://ktor.io/docs/http-client-engines.html) for each target.\n\n## 📄 License\nKtoken is open-source software and distributed under the [MIT license](LICENSE.md).\n**This project is not affiliated with nor endorsed by OpenAI**.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faallam%2Fktoken","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faallam%2Fktoken","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faallam%2Fktoken/lists"}