{"id":27732867,"url":"https://github.com/alvarobartt/bpe.zig","last_synced_at":"2025-04-28T11:48:13.713Z","repository":{"id":275831201,"uuid":"920578779","full_name":"alvarobartt/bpe.zig","owner":"alvarobartt","description":"Minimal implementation of a Byte Pair Encoding (BPE) tokenizer in Zig","archived":false,"fork":false,"pushed_at":"2025-04-07T14:10:48.000Z","size":129,"stargazers_count":11,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-20T23:56:28.489Z","etag":null,"topics":["bpe","gpt-2","tokenizer","zig"],"latest_commit_sha":null,"homepage":"","language":"Zig","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alvarobartt.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-01-22T12:03:53.000Z","updated_at":"2025-04-07T14:11:25.000Z","dependencies_parsed_at":null,"dependency_job_id":"87df48f3-246b-4d69-9318-472a460ce9be","html_url":"https://github.com/alvarobartt/bpe.zig","commit_stats":null,"previous_names":["alvarobartt/tokeni.zig","alvarobartt/bpe.zig"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alvarobartt%2Fbpe.zig","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alvarobartt%2Fbpe.zig/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alvarobartt%2Fbpe.zig/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alvarobartt%2Fbpe.zig/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alvarobartt","download_url":"https://codeload.github.com/alvarobartt/bpe.zig/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251309841,"owners_count":21568913,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bpe","gpt-2","tokenizer","zig"],"created_at":"2025-04-28T11:48:12.959Z","updated_at":"2025-04-28T11:48:13.708Z","avatar_url":"https://github.com/alvarobartt.png","language":"Zig","funding_links":[],"categories":[],"sub_categories":[],"readme":"# bpe.zig\n\n`bpe.zig` is a minimal implementation of a Byte Pair Encoding (BPE) tokenizer in Zig.\n\n\u003e [!WARNING]\n\u003e This implementation is currently an educational project for exploring Zig and\n\u003e tokenizer internals (particularly BPE used in models like e.g. GPT-2).\n\n## Usage\n\nFirst you need to download the `tokenizer.json` file from the Hugging Face Hub\nat [`openai-community/gpt2`](https://huggingface.co/openai-community/gpt2).\n\n```zig\nconst std = @import(\"std\");\nconst Tokenizer = @import(\"bpe.Tokenizer\");\n\npub fn main() !void {\n    var arena = std.heap.ArenaAllocator.init(std.heap.page_allocator);\n    defer arena.deinit();\n    const allocator = arena.allocator();\n\n    // https://huggingface.co/openai-community/gpt2/tree/main/tokenizer.json\n    var tokenizer = try Tokenizer.init(\"tokenizer.json\", allocator);\n    defer tokenizer.deinit();\n\n    const text = \"Hello, I'm a test string with numbers 123 and symbols @#$!\u003c|endoftext|\u003e\";\n    const encoding = try tokenizer.encode(text);\n    defer allocator.free(encoding);\n\n    std.debug.print(\"Encoded tokens: {any}\\n\", .{encoding});\n}\n```\n\n## License\n\nThis project is licensed under either of the following licenses, at your option:\n\n- [Apache License, Version 2.0](LICENSE-APACHE)\n- [MIT License](LICENSE-MIT)\n\nUnless you explicitly state otherwise, any contribution intentionally submitted\nfor inclusion in this project by you, as defined in the Apache-2.0 license, shall\nbe dual licensed as above, without any additional terms or conditions.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falvarobartt%2Fbpe.zig","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falvarobartt%2Fbpe.zig","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falvarobartt%2Fbpe.zig/lists"}