{"id":14885190,"url":"https://github.com/zml/zml","last_synced_at":"2025-04-12T03:43:53.617Z","repository":{"id":257566474,"uuid":"858639154","full_name":"zml/zml","owner":"zml","description":"Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild","archived":false,"fork":false,"pushed_at":"2025-04-03T16:04:37.000Z","size":2103,"stargazers_count":2186,"open_issues_count":26,"forks_count":77,"subscribers_count":27,"default_branch":"master","last_synced_at":"2025-04-05T01:01:36.517Z","etag":null,"topics":["ai","bazel","hpc","inference","xla","zig"],"latest_commit_sha":null,"homepage":"https://docs.zml.ai","language":"Zig","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zml.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-17T09:13:32.000Z","updated_at":"2025-04-04T07:20:49.000Z","dependencies_parsed_at":"2024-09-17T12:55:34.737Z","dependency_job_id":"8a6079af-72a1-494a-81d1-5285188127a5","html_url":"https://github.com/zml/zml","commit_stats":{"total_commits":103,"total_committers":14,"mean_commits":7.357142857142857,"dds":0.5728155339805825,"last_synced_commit":"cf135bd4281c75110fa33501ac87f216c7c3edb2"},"previous_names":["zml/zml"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zml%2Fzml","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zml%2Fzml/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zml%2Fzml/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zml%2Fzml/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zml","download_url":"https://codeload.github.com/zml/zml/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248404444,"owners_count":21097743,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","bazel","hpc","inference","xla","zig"],"created_at":"2024-09-21T16:00:52.448Z","updated_at":"2025-04-12T03:43:53.597Z","avatar_url":"https://github.com/zml.png","language":"Zig","funding_links":[],"categories":["Zig","其他_机器学习与深度学习","Inference","Libraries","Data \u0026 Science"],"sub_categories":["Inference Engine","Machine Learning Framework"],"readme":"\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"https://raw.githubusercontent.com/zml/zml.github.io/refs/heads/main/docs-assets/zml-banner.png\" style=\"width:100%; height:120px;\"\u003e\n  \u003ca href=\"https://zml.ai\"\u003eWebsite\u003c/a\u003e\n  | \u003ca href=\"#getting-started\"\u003eGetting Started\u003c/a\u003e\n  | \u003ca href=\"https://docs.zml.ai\"\u003eDocumentation\u003c/a\u003e\n  | \u003ca href=\"https://discord.gg/6y72SN2E7H\"\u003eDiscord\u003c/a\u003e\n  | \u003ca href=\"./CONTRIBUTING.md\"\u003eContributing\u003c/a\u003e\n\u003c/div\u003e\n\n[ZML]: https://zml.ai/\n[Getting Started]: #getting-started\n[Documentation]: https://docs.zml.ai\n[Contributing]: ./CONTRIBUTING.md\n[Discord]: https://discord.gg/6y72SN2E7H\n\n# Bonjour 👋\n\nAt ZML, we are creating exciting AI products on top of our high-performance\nAI inference stack. Our stack is built for production, using the amazing\n[Zig](https://ziglang.org) language, [MLIR](https://mlir.llvm.org), and the\npower of [Bazel](https://bazel.build).\n\n\u003cdiv align=\"center\"\u003e\n  \u003cdiv\u003eTake me straight to \u003ca href=\"#getting-started\"\u003egetting started\u003c/a\u003e or \u003ca href=\"#a-taste-of-zml\"\u003egive me a taste\u003c/a\u003e 🥐!\u003c/div\u003e\n\u003c/div\u003e\n\n---\n\n\u0026nbsp;\n\n# We're happy to share!\nWe're very happy to share our inference stack with the World and hope it allows\nyou, too, to build cool and exciting AI projects.\n\nTo give you a glimpse of what you can do with ZML, here is an early demo:\n\n\u003cdiv align=\"center\"\u003e\u003cimg src=\"https://zml.ai/docs-assets/ZML.gif\" style=\"width:75%\"\u003e\u003c/div\u003e\n\nIt shows a prototype running a LLaMA2 model sharded on 1 NVIDIA RTX 4090, 1 AMD\n6800XT, and 1 Google Cloud TPU v2.  All accelerators were hosted in different\nlocations, with activations being passed over a VPN.\n\nAll processes used the same model code, cross-compiled on a Mac, and copied onto\nthe servers.\n\nFor more inspiration, see also the examples below or check out the\n[examples](./examples) folder.\n\n\n\n# Getting started\n\n\n\n## Prerequisites\n\nWe use `bazel` to build ZML and its dependencies. The only prerequisite is\n`bazel`, which we recommend to download through `bazelisk`, a version manager\nfor `bazel`.\n\n**Please note: If you do not wish to install `bazel`** system-wide, we provide\n[examples/bazel.sh](examples/bazel.sh) which downloads it to your home folder\nand runs it.\n\n**Install Bazel** (recommended):\n\n\u003cdetails\u003e\u003csummary\u003e\n\n### macOS\n\u003c/summary\u003e\n\n```\nbrew install bazelisk\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\u003csummary\u003e\n\n### Linux\n\u003c/summary\u003e\n\n```\ncurl -L -o /usr/local/bin/bazel 'https://github.com/bazelbuild/bazelisk/releases/download/v1.25.0/bazelisk-linux-amd64'\nchmod +x /usr/local/bin/bazel\n```\n\u003c/details\u003e\n\n\n## Run a pre-packaged model\n\nWe have implemented a variety of example models in ZML. See our reference\nimplementations in the\n[examples](https://github.com/zml/zml/tree/master/examples/) folder.\n\n### MNIST\n\nThe [classic](https://en.wikipedia.org/wiki/MNIST_database) handwritten digits\nrecognition task. The model is tasked to recognize a handwritten digit, which\nhas been converted to a 28x28 pixel monochrome image. `Bazel` will download a\npre-trained model, and the test dataset. The program will load the model,\ncompile it, and classify a randomly picked example from the test dataset.\n\nOn the command line:\n\n```\ncd examples\nbazel run -c opt //mnist\n\n# or\n./bazel.sh run -c opt //mnist\n```\n\n\n\n### Meta Llama 3.1 8B\n\nThis model has restrictions, see\n[here](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct). It **requires\napproval from Meta on Huggingface**, which can take a few hours to get granted.\n\nWhile waiting, you can already generate an access token to log into HuggingFace\nfrom `bazel`; see [here](./docs/huggingface-access-token.md).\n\nOnce you've been granted access, you're ready to download a gated model like\n`Meta-Llama-3.1-8B-Instruct`!\n\n```\n# requires token in $HOME/.cache/huggingface/token, as created by the\n# `huggingface-cli login` command, or the `HUGGINGFACE_TOKEN` environment variable.\ncd examples\nbazel run -c opt //llama:Llama-3.1-8B-Instruct\nbazel run -c opt //llama:Llama-3.1-8B-Instruct -- --prompt=\"What is the capital of France?\"\n```\n\nYou can also try `Llama-3.1-70B-Instruct` if you have enough memory.\n\n### Meta Llama 3.2 1B\n\nLike the 8B model above, this model also requires approval. See\n[here](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) for access requirements.\n\n```\ncd examples\nbazel run -c opt //llama:Llama-3.2-1B-Instruct\nbazel run -c opt //llama:Llama-3.2-1B-Instruct -- --prompt=\"What is the capital of France?\"\n```\n\nFor a larger 3.2 model, you can also try `Llama-3.2-3B-Instruct`.\n\n## Running Models on GPU / TPU\n\nYou can compile models for accelerator runtimes by appending one or more of the\nfollowing arguments to the command line when compiling / running a model:\n\n- NVIDIA CUDA: `--@zml//runtimes:cuda=true`\n- AMD RoCM: `--@zml//runtimes:rocm=true`\n- Google TPU: `--@zml//runtimes:tpu=true`\n- AWS Trainium/Inferentia 2: `--@zml//runtimes:neuron=true`\n- **AVOID CPU:** `--@zml//runtimes:cpu=false`\n\nThe latter, avoiding compilation for CPU, cuts down compilation time.\n\nSo, to run the OpenLLama model from above on your host sporting an NVIDIA GPU,\nrun the following:\n\n```\ncd examples\nbazel run -c opt //llama:Llama-3.2-1B-Instruct             \\\n          --@zml//runtimes:cuda=true                       \\\n          -- --prompt=\"What is the capital of France?\"\n```\n\n\n## Run Tests\n\n```\nbazel test //zml:test\n```\n\n\n# A taste of ZML\n\n\n\n## MNIST\n\n\n```zig\nconst std = @import(\"std\");\nconst zml = @import(\"zml\");\n\n/// Model definition\nconst Mnist = struct {\n    fc1: Layer,\n    fc2: Layer,\n\n    const Layer = struct {\n        weight: zml.Tensor,\n        bias: zml.Tensor,\n\n        pub fn forward(self: Layer, input: zml.Tensor) zml.Tensor {\n            return self.weight.matmul(input).add(self.bias).relu();\n        }\n    };\n\n    /// just two linear layers + relu activation\n    pub fn forward(self: Mnist, input: zml.Tensor) zml.Tensor {\n        std.log.info(\"Compiling for target: {s}\", .{@tagName(input.getContext().target())});\n        var x = input.flattenAll().convert(.f32);\n        const layers: []const Layer = \u0026.{ self.fc1, self.fc2 };\n        for (layers) |layer| {\n            x = zml.call(layer, .forward, .{x});\n        }\n        return x.argMax(0, .u8).indices;\n    }\n};\n```\n\n\n\n## Tagged Tensors\n\n```zig\nconst Sdpa = struct {\n    pub fn forward(_: Sdpa, ctx: *zml.Context, q_: zml.Tensor, k_: zml.Tensor, v_: zml.Tensor) zml.Tensor {\n        const q = q_.withTags(.{ .b, .h, .q, .hd });\n        const k = k_.withTags(.{ .b, .h, .k, .hd });\n        const v = v_.withTags(.{ .b, .h, .k, .hd });\n        const attn_mask = zml.nn.causalAttnMask(ctx, .{ .q = q.dim(.q), .k = k.dim(.k) }, q.dtype(), null);\n        return zml.nn.sdpa(ctx, q, k, v, .{ .attn_mask = attn_mask });\n    }\n};\n```\n\n\n\n\n# Where to go next:\n\nYou might want to check out more [examples](./examples), read through the\n[documentation directly on GitHub](./docs/README.md), or, for the full rendering\nexperience, browse the\n[online documentation with included API reference](https://docs.zml.ai).\n\n\n\n# Contributing\n\nSee [here][Contributing].\n\n\n\n# License\n\nZML is licensed under the [Apache 2.0 license](./LICENSE).\n\n\n\n# Thanks to our contributors\n\n\u003ca href=\"https://github.com/zml/zml/graphs/contributors\"\u003e\n  \u003cimg src=\"https://contrib.rocks/image?repo=zml/zml\" /\u003e\n\u003c/a\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzml%2Fzml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzml%2Fzml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzml%2Fzml/lists"}