https://github.com/dimforge/wgml
Cross-platform GPU LLM inference with WebGPU and wgmath.
https://github.com/dimforge/wgml
Last synced: 9 months ago
JSON representation
Cross-platform GPU LLM inference with WebGPU and wgmath.
- Host: GitHub
- URL: https://github.com/dimforge/wgml
- Owner: dimforge
- License: apache-2.0
- Created: 2025-03-15T15:35:36.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2025-05-04T20:19:40.000Z (10 months ago)
- Last Synced: 2025-06-09T20:47:47.999Z (9 months ago)
- Language: Rust
- Homepage:
- Size: 269 KB
- Stars: 7
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE-APACHE.txt
Awesome Lists containing this project
README
# wgml − GPU local inference every platform
-----
**wgml** is a set of [Rust](https://www.rust-lang.org/) libraries exposing [WebGPU](https://www.w3.org/TR/WGSL/) shaders
and kernels for local Large Language Models (LLMs) inference on the GPU. It is cross-platform and runs on the web.
**wgml** can be used as a rust library to assemble your own transformer from the provided operators (and write your
owns on top of it).
Aside from the library, two binary crates are provided:
- **wgml-bench** is a basic benchmarking utility for measuring calculation times for matrix multiplication with various
quantization formats.
- **wgml-chat** is a basic chat GUI application for loading GGUF files and chat with the model. It can be run natively
or on the browser. Check out its [README](./crates/wgml-chat/README.md) for details on how to run it. You can run
it from your browser with the [online demo](https://wgmath.rs/demos/wgml/index.html).
⚠️ **wgml** is still under heavy development and might be lacking some important features. Contributions are welcome!
----