{"id":45432257,"url":"https://github.com/m96-chan/0xbitnet","last_synced_at":"2026-02-26T06:03:34.998Z","repository":{"id":339410676,"uuid":"1161776081","full_name":"m96-chan/0xBitNet","owner":"m96-chan","description":"Run BitNet b1.58 ternary LLMs with WebGPU — in browsers and native apps","archived":false,"fork":false,"pushed_at":"2026-02-22T06:21:28.000Z","size":9118,"stargazers_count":5,"open_issues_count":2,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-22T08:08:00.453Z","etag":null,"topics":["bitnet","inference","llm","ternary","typescript","webgpu","wgsl"],"latest_commit_sha":null,"homepage":"https://0xbitnet.m96-chan.dev","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/m96-chan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-19T14:02:34.000Z","updated_at":"2026-02-22T06:21:31.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/m96-chan/0xBitNet","commit_stats":null,"previous_names":["m96-chan/0xbitnet"],"tags_count":10,"template":false,"template_full_name":null,"purl":"pkg:github/m96-chan/0xBitNet","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/m96-chan%2F0xBitNet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/m96-chan%2F0xBitNet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/m96-chan%2F0xBitNet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/m96-chan%2F0xBitNet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/m96-chan","download_url":"https://codeload.github.com/m96-chan/0xBitNet/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/m96-chan%2F0xBitNet/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29735904,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-23T02:24:00.660Z","status":"ssl_error","status_checked_at":"2026-02-23T02:22:56.087Z","response_time":90,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bitnet","inference","llm","ternary","typescript","webgpu","wgsl"],"created_at":"2026-02-22T02:09:22.639Z","updated_at":"2026-02-23T03:01:19.822Z","avatar_url":"https://github.com/m96-chan.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://raw.githubusercontent.com/m96-chan/0xBitNet/main/assets/hero.png\" alt=\"0xBitNet — 1-bit Inference on WebGPU\" width=\"720\" /\u003e\n\u003c/p\u003e\n\n\u003ch1 align=\"center\"\u003e0xBitNet\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/m96-chan/0xBitNet/actions/workflows/ci.yml\"\u003e\u003cimg src=\"https://img.shields.io/github/actions/workflow/status/m96-chan/0xBitNet/ci.yml?branch=main\u0026label=CI\" alt=\"CI\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/m96-chan/0xBitNet/actions/workflows/rust-ci.yml\"\u003e\u003cimg src=\"https://img.shields.io/github/actions/workflow/status/m96-chan/0xBitNet/rust-ci.yml?branch=main\u0026label=Rust%20CI\" alt=\"Rust CI\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://www.npmjs.com/package/0xbitnet\"\u003e\u003cimg src=\"https://img.shields.io/npm/v/0xbitnet\" alt=\"npm\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://crates.io/crates/oxbitnet\"\u003e\u003cimg src=\"https://img.shields.io/crates/v/oxbitnet\" alt=\"crates.io\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://pypi.org/project/oxbitnet/\"\u003e\u003cimg src=\"https://img.shields.io/pypi/v/oxbitnet\" alt=\"PyPI\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://central.sonatype.com/artifact/io.github.m96-chan/oxbitnet\"\u003e\u003cimg src=\"https://img.shields.io/maven-central/v/io.github.m96-chan/oxbitnet\" alt=\"Maven Central\"\u003e\u003c/a\u003e\n  \u003ca href=\"LICENSE\"\u003e\u003cimg src=\"https://img.shields.io/github/license/m96-chan/0xBitNet\" alt=\"License\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cstrong\u003eRun \u003ca href=\"https://github.com/microsoft/BitNet\"\u003eMicrosoft BitNet b1.58\u003c/a\u003e ternary LLMs with WebGPU — in browsers and native apps.\u003c/strong\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://m96-chan.github.io/0xBitNet/chat/\"\u003eLive Chat Demo\u003c/a\u003e · \u003ca href=\"https://m96-chan.github.io/0xBitNet/tldr/\"\u003eTL;DR Widget Demo\u003c/a\u003e · \u003ca href=\"docs/getting-started.md\"\u003eGetting Started\u003c/a\u003e · \u003ca href=\"docs/api-reference.md\"\u003eAPI Reference\u003c/a\u003e\n\u003c/p\u003e\n\n---\n\n0xBitNet runs BitNet b1.58 ternary LLMs on WebGPU. Custom WGSL compute kernels handle the ternary matrix operations, with bindings for TypeScript, Rust, and Python. Works in browsers, Node.js, and native apps.\n\n## Highlights\n\n- **Pure WebGPU** — Custom WGSL kernels for ternary matrix operations (no WASM, no server)\n- **Multi-language** — TypeScript (`0xbitnet`), Rust (`oxbitnet`), Python (`oxbitnet`), Swift (`OxBitNet`), Java/Android (`oxbitnet-java`), C (`oxbitnet-ffi`)\n- **Cross-platform** — Browsers, Node.js, Deno, native apps via wgpu\n- **Chat templates** — Built-in LLaMA 3 chat message formatting\n- **Automatic caching** — IndexedDB (browser) / disk cache (native)\n- **Streaming** — Token-by-token output via async generators / streams / callbacks\n\n## Quick Start\n\n### TypeScript / JavaScript\n\n```bash\nnpm install 0xbitnet\n```\n\n```typescript\nimport { BitNet } from \"0xbitnet\";\n\nconst model = await BitNet.load(\n  \"https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf/resolve/main/ggml-model-i2_s.gguf\",\n  { onProgress: (p) =\u003e console.log(`${p.phase}: ${(p.fraction * 100).toFixed(1)}%`) }\n);\n\nfor await (const token of model.generate(\"The meaning of life is\")) {\n  process.stdout.write(token);\n}\n\nmodel.dispose();\n```\n\n### Rust\n\n```bash\ncargo add oxbitnet\n```\n\n```rust\nuse oxbitnet::BitNet;\nuse futures::StreamExt;\n\nlet mut model = BitNet::load(\"model.gguf\", Default::default()).await?;\n\nlet mut stream = model.generate(\"Hello!\", Default::default());\nwhile let Some(token) = stream.next().await {\n    print!(\"{token}\");\n}\n\nmodel.dispose();\n```\n\n### Python\n\n```bash\npip install oxbitnet\n```\n\n```python\nfrom oxbitnet import BitNet\n\nmodel = BitNet.load_sync(\"model.gguf\")\n\nmodel.chat(\n    [(\"system\", \"You are a helpful assistant.\"), (\"user\", \"Hello!\")],\n    on_token=lambda t: print(t, end=\"\", flush=True),\n    temperature=0.7,\n)\n\nmodel.dispose()\n```\n\n### Swift\n\n```swift\nimport OxBitNet\n\nlet model = try await BitNet.load(\"model.gguf\")\n\nfor try await token in model.chat([.user(\"Hello!\")], options: .init(temperature: 0.7)) {\n    print(token, terminator: \"\")\n}\n\nmodel.dispose()\n```\n\n### Java\n\n```java\nimport io.github.m96chan.oxbitnet.*;\nimport java.util.List;\n\ntry (BitNet model = BitNet.loadSync(\"model.gguf\")) {\n    model.chat(\n        List.of(new ChatMessage(\"user\", \"Hello!\")),\n        token -\u003e {\n            System.out.print(token);\n            return true;\n        },\n        new GenerateOptions().temperature(0.7f)\n    );\n}\n```\n\n### C / FFI\n\n```c\n#include \"oxbitnet.h\"\n\nstatic int32_t on_token(const char *token, uintptr_t len, void *userdata) {\n    fwrite(token, 1, len, stdout);\n    return 0; /* 0 = continue, non-zero = stop */\n}\n\nint main(void) {\n    OxBitNet *model = oxbitnet_load(\"model.gguf\", NULL);\n\n    OxBitNetChatMessage messages[] = {\n        { .role = \"user\", .content = \"Hello!\" },\n    };\n    OxBitNetGenerateOptions opts = oxbitnet_default_generate_options();\n\n    oxbitnet_chat(model, messages, 1, \u0026opts, on_token, NULL);\n    oxbitnet_free(model);\n}\n```\n\n### Chat Messages (TypeScript)\n\n```typescript\nconst messages = [\n  { role: \"system\" as const, content: \"You are a helpful assistant.\" },\n  { role: \"user\" as const, content: \"Explain quantum computing in one sentence.\" },\n];\n\nfor await (const token of model.generate(messages, { maxTokens: 128, temperature: 0.7 })) {\n  process.stdout.write(token);\n}\n```\n\n## Supported Models\n\n| Model | GGUF | Parameters | VRAM |\n|-------|------|------------|------|\n| [BitNet b1.58 2B-4T](https://huggingface.co/microsoft/BitNet-b1.58-2B-4T) | [ggml-model-i2_s.gguf](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf) | 2B | ~1.5 GB |\n| [Falcon-E 1B Instruct](https://huggingface.co/tiiuae/Falcon-E-1B-Instruct-GGUF) | [ggml-model-i2_s.gguf](https://huggingface.co/tiiuae/Falcon-E-1B-Instruct-GGUF) | 1B | ~666 MB |\n| [Falcon-E 3B Instruct](https://huggingface.co/tiiuae/Falcon-E-3B-Instruct-GGUF) | [ggml-model-i2_s.gguf](https://huggingface.co/tiiuae/Falcon-E-3B-Instruct-GGUF) | 3B | ~1 GB |\n\nAny I2_S GGUF model with a compatible architecture should work — see [Model Compatibility](docs/model-compatibility.md) for details.\n\n## Install\n\n| Language | Package | Install |\n|----------|---------|---------|\n| TypeScript / JS | [`0xbitnet`](https://www.npmjs.com/package/0xbitnet) | `npm install 0xbitnet` |\n| Rust | [`oxbitnet`](https://crates.io/crates/oxbitnet) | `cargo add oxbitnet` |\n| Python | [`oxbitnet`](https://pypi.org/project/oxbitnet/) | `pip install oxbitnet` |\n| Swift / iOS | `OxBitNet` | Swift Package Manager (see [oxbitnet-swift](packages/rust/crates/oxbitnet-swift/)) |\n| Java / Android | [`oxbitnet`](https://central.sonatype.com/artifact/io.github.m96-chan/oxbitnet) | `implementation(\"io.github.m96-chan:oxbitnet:0.5.2\")` |\n| C / FFI | `oxbitnet-ffi` | `cargo build -p oxbitnet-ffi --release` |\n\n## API Overview\n\n### TypeScript\n\n| Method | Description |\n|--------|-------------|\n| `BitNet.load(url, options?)` | Load a GGUF model from a URL |\n| `bitnet.generate(prompt, options?)` | Stream tokens as an `AsyncGenerator\u003cstring\u003e` |\n| `bitnet.diagnose(prompt?)` | Run GPU diagnostics on a forward pass |\n| `bitnet.dispose()` | Release all GPU resources |\n\n### Rust\n\n| Method | Description |\n|--------|-------------|\n| `BitNet::load(source, options).await` | Load a GGUF model |\n| `bitnet.generate(prompt, options)` | Stream tokens as `impl Stream\u003cItem = String\u003e` |\n| `bitnet.generate_chat(messages, options)` | Chat with template formatting |\n| `bitnet.dispose()` | Release all GPU resources |\n\n### Python\n\n| Method | Description |\n|--------|-------------|\n| `BitNet.load_sync(source)` | Load a GGUF model |\n| `model.chat(messages, on_token)` | Chat with streaming callback |\n| `model.generate(prompt, on_token)` | Generate with streaming callback |\n| `model.generate_sync(prompt)` | Generate, return full string |\n| `model.dispose()` | Release all GPU resources |\n\n### Swift\n\n| Method | Description |\n|--------|-------------|\n| `BitNet.load(source, options:)` | Load a GGUF model (async) |\n| `BitNet.loadSync(source, options:)` | Load a GGUF model (blocking) |\n| `model.generate(prompt, options:)` | Stream tokens as `AsyncThrowingStream\u003cString, Error\u003e` |\n| `model.chat(messages, options:)` | Chat with streaming via `AsyncThrowingStream` |\n| `model.dispose()` | Release all GPU resources (also called by `deinit`) |\n\n### Java\n\n| Method | Description |\n|--------|-------------|\n| `BitNet.loadSync(source, options?)` | Load a GGUF model |\n| `model.chat(messages, callback, options?)` | Chat with streaming callback |\n| `model.generate(prompt, callback, options?)` | Generate with streaming callback |\n| `model.dispose()` / `model.close()` | Release all GPU resources (AutoCloseable) |\n\n### C / FFI\n\n| Function | Description |\n|----------|-------------|\n| `oxbitnet_load(source, options)` | Load a GGUF model, returns opaque handle |\n| `oxbitnet_chat(model, messages, n, opts, cb, ud)` | Chat with streaming callback |\n| `oxbitnet_generate(model, prompt, opts, cb, ud)` | Generate with streaming callback |\n| `oxbitnet_free(model)` | Release all GPU resources |\n| `oxbitnet_error_message()` | Get last error (thread-local) |\n\n## Platform Support\n\n0xBitNet runs on any platform with a [WebGPU](https://www.w3.org/TR/webgpu/) implementation:\n\n**Browsers:**\n- Chrome / Edge 113+ (recommended)\n- Firefox Nightly (behind flag)\n- Safari 18+\n\n**Native (Rust / Python):**\n- Uses [wgpu](https://wgpu.rs/) — Vulkan, Metal, DX12 backends automatically\n- No browser or WebGPU runtime needed\n\n**Native (Node.js / Deno):**\n- Deno (built-in WebGPU)\n- Node.js with [`webgpu`](https://www.npmjs.com/package/webgpu) npm package (Dawn bindings) — see [Node.js CLI example](examples/node-cli/)\n- Any runtime exposing the WebGPU API (e.g., wgpu-native, Electron)\n\nA dedicated GPU with sufficient VRAM is required (see [Supported Models](#supported-models) for estimates).\n\n## Examples\n\n### [Web Chat](https://m96-chan.github.io/0xBitNet/chat/)\n\nA WebGPU-powered chat application. Downloads the model on first visit, then runs LLM chat completely on-device — no backend needed.\n\n### [TL;DR Widget](https://m96-chan.github.io/0xBitNet/tldr/)\n\nAn offline-ready summarization widget. Provides LLM-powered TL;DR without any network dependency.\n\n### [Node.js CLI](examples/node-cli/)\n\nRun BitNet from the command line using Node.js and the [`webgpu`](https://www.npmjs.com/package/webgpu) npm package (Dawn bindings). Interactive chat with streaming output and tok/s metrics.\n\n```bash\ncd examples/node-cli\nnpm install \u0026\u0026 npm start\n```\n\n### Rust CLI\n\nInteractive chat using native wgpu.\n\n```bash\ncd packages/rust\ncargo run --example chat --release\n```\n\n### Python CLI\n\nInteractive chat via Python bindings.\n\n```bash\npip install oxbitnet\npython packages/rust/crates/oxbitnet-python/examples/chat.py\n```\n\n### Swift CLI\n\nMinimal Swift chat example wrapping the C FFI layer.\n\n```bash\ncd packages/rust\ncargo build -p oxbitnet-ffi --release\ncd crates/oxbitnet-swift\nswift run -Xlinker -L../../../../target/release Chat model.gguf \"Hello!\"\n```\n\n### Java CLI\n\nMinimal Java chat example using JNI bindings.\n\n```bash\ncd packages/rust\ncargo build -p oxbitnet-java --release\ncd crates/oxbitnet-java/examples\njavac -cp ../java/src/main/java:. Chat.java\njava -Djava.library.path=../../../../target/release -cp ../java/src/main/java:. Chat model.gguf \"Hello!\"\n```\n\n### C CLI\n\nMinimal C example using the FFI bindings.\n\n```bash\ncd packages/rust\ncargo build -p oxbitnet-ffi --release\ngcc crates/oxbitnet-ffi/examples/chat.c -Icrates/oxbitnet-ffi -Ltarget/release -loxbitnet_ffi -o chat\nLD_LIBRARY_PATH=target/release ./chat model.gguf \"Hello!\"\n```\n\n## Architecture\n\n```\n0xbitnet/\n├── packages/\n│   ├── core/               # WGSL kernels + TypeScript API (npm: 0xbitnet)\n│   │   └── src/\n│   │       ├── gpu/        # WebGPU device init, buffer pool\n│   │       ├── model/      # GGUF parser, weight loader, config\n│   │       ├── nn/         # Transformer layers, attention, BitLinear\n│   │       ├── shaders/    # 12 WGSL compute shaders (shared with Rust)\n│   │       └── tokenizer/  # BPE tokenizer, chat templates\n│   └── rust/               # Rust + Python bindings\n│       └── crates/\n│           ├── oxbitnet/           # Rust library (crates.io: oxbitnet)\n│           ├── oxbitnet-python/    # Python bindings via PyO3 (PyPI: oxbitnet)\n│           ├── oxbitnet-swift/     # Swift bindings via C FFI (SPM package)\n│           ├── oxbitnet-java/      # Java/JNI bindings (Android-ready)\n│           └── oxbitnet-ffi/       # C FFI bindings (cdylib + staticlib)\n├── examples/\n│   ├── web-chat/           # Chat app demo (Vite)\n│   ├── tl-dr-widget/       # Offline TL;DR widget demo (Vite)\n│   └── node-cli/           # Node.js CLI using Dawn WebGPU bindings\n└── docs/\n```\n\nSee [Architecture](docs/architecture.md) for data flow and internals.\n\n## Prerequisites\n\n- **TypeScript/JS**: Node.js 18+, a WebGPU-capable environment\n- **Rust**: Rust 1.75+, a Vulkan/Metal/DX12-capable GPU\n- **Swift**: Swift 5.9+, a Vulkan/Metal/DX12-capable GPU\n- **Java**: JDK 17+, a Vulkan/Metal/DX12-capable GPU\n- **Python**: Python 3.9+, `pip install oxbitnet`\n\n## Contributing\n\nContributions are welcome! Whether it's a bug report, feature request, or pull request — all input is appreciated.\n\n1. Fork the repository\n2. Create your feature branch (`git checkout -b feature/amazing-feature`)\n3. Commit your changes (`git commit -m 'Add amazing feature'`)\n4. Push to the branch (`git push origin feature/amazing-feature`)\n5. Open a Pull Request\n\nPlease see [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.\n\n## License\n\n[MIT](LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fm96-chan%2F0xbitnet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fm96-chan%2F0xbitnet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fm96-chan%2F0xbitnet/lists"}