{"id":29873699,"url":"https://github.com/trymirai/uzu","last_synced_at":"2026-06-08T04:01:16.654Z","repository":{"id":303683692,"uuid":"1007360921","full_name":"trymirai/uzu","owner":"trymirai","description":"A high-performance inference engine for AI models","archived":false,"fork":false,"pushed_at":"2026-06-03T22:44:18.000Z","size":9596,"stargazers_count":1608,"open_issues_count":7,"forks_count":55,"subscribers_count":7,"default_branch":"main","last_synced_at":"2026-06-04T00:10:32.002Z","etag":null,"topics":["ai","high-performance","inference","llm","metal","rust","tts"],"latest_commit_sha":null,"homepage":"https://trymirai.com","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/trymirai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-06-23T21:55:11.000Z","updated_at":"2026-06-03T17:27:20.000Z","dependencies_parsed_at":"2025-07-08T22:24:25.844Z","dependency_job_id":"d7b727fb-0091-44f7-88b9-9d1e4c14fa2d","html_url":"https://github.com/trymirai/uzu","commit_stats":null,"previous_names":["trymirai/uzu"],"tags_count":8,"template":false,"template_full_name":null,"purl":"pkg:github/trymirai/uzu","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trymirai%2Fuzu","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trymirai%2Fuzu/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trymirai%2Fuzu/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trymirai%2Fuzu/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/trymirai","download_url":"https://codeload.github.com/trymirai/uzu/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/trymirai%2Fuzu/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34047266,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-08T02:00:07.615Z","response_time":111,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","high-performance","inference","llm","metal","rust","tts"],"created_at":"2025-07-30T23:02:36.398Z","updated_at":"2026-06-08T04:01:16.643Z","avatar_url":"https://github.com/trymirai.png","language":"Rust","funding_links":[],"categories":["Rust","LLM Inference \u0026 Serving Tools"],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cpicture\u003e\n    \u003cimg alt=\"Mirai\" src=\"https://artifacts.trymirai.com/social/github/uzu-header.jpg\" style=\"max-width: 100%;\"\u003e\n  \u003c/picture\u003e\n\u003c/p\u003e\n\n\u003ca href=\"https://discord.com/invite/trymirai\"\u003e\u003cimg src=\"https://img.shields.io/discord/1377764166764462120?label=Discord\u0026color=brightgreen\" alt=\"Discord\"\u003e\u003c/a\u003e \u003ca href=\"mailto:contact@getmirai.co?subject=Interested%20in%20Mirai\"\u003e\u003cimg src=\"https://img.shields.io/badge/Send-Email-brightgreen\" alt=\"Contact us\"\u003e\u003c/a\u003e \u003ca href=\"https://docs.trymirai.com\"\u003e\u003cimg src=\"https://img.shields.io/badge/Read-Docs-brightgreen\" alt=\"Read docs\"\u003e\u003c/a\u003e [![License](https://img.shields.io/badge/License-MIT-brightgreen)](LICENSE) [![Build](https://github.com/trymirai/uzu/actions/workflows/tests.yml/badge.svg)](https://github.com/trymirai/uzu/actions) [![Python](https://img.shields.io/badge/Python-orange)](bindings/python) [![Package](https://img.shields.io/pypi/v/uzu?color=orange\u0026label=Package\u0026v=0.5.12)](https://pypi.org/project/uzu/) [![Python](https://img.shields.io/pypi/pyversions/uzu?color=orange\u0026label=Python\u0026v=0.5.12)](https://pypi.org/project/uzu/) [![TypeScript](https://img.shields.io/badge/TypeScript-yellow)](bindings/typescript) [![Package](https://img.shields.io/npm/v/@trymirai/uzu?color=yellow\u0026label=Package\u0026v=0.5.12)](https://www.npmjs.com/package/@trymirai/uzu) [![Downloads](https://img.shields.io/npm/dm/@trymirai/uzu?color=yellow\u0026label=Downloads\u0026v=0.5.12)](https://www.npmjs.com/package/@trymirai/uzu) [![Swift](https://img.shields.io/badge/Swift-blue)](bindings/swift) [![SPM](https://img.shields.io/badge/SPM-compatible-blue)](Package.swift) [![Platforms](https://img.shields.io/badge/Platforms-iOS%20%7C%20macOS-blue)](Package.swift) [![Swift](https://img.shields.io/badge/Swift-5.9-blue)](https://swift.org) \n\n# uzu\n\nA high-performance inference engine for AI models. It allows you to deploy AI directly in your app with **zero latency**, **full data privacy**, and **no inference costs**. Key features:\n\n- Simple, high-level API\n- Unified model configurations, making it easy to add support for new models\n- Traceable computations to ensure correctness against the source-of-truth implementation\n- Utilizes unified memory on Apple devices\n- [Broad model support](https://trymirai.com/models)\n\n## Quick Start\n\n\n\n\u003cdetails\u003e\n\u003csummary\u003eRust\u003c/summary\u003e\n\u003cbr\u003e\n\nAdd the dependency:\n\n```toml\n[dependencies]\nuzu = { git = \"https://github.com/trymirai/uzu\", branch = \"main\", package = \"uzu\" }\n```\n\nRun the code below:\n\n```rust\nuse uzu::{\n    engine::{Engine, EngineConfig},\n    types::session::chat::{ChatConfig, ChatMessage, ChatReplyConfig},\n};\n\n#[tokio::main]\nasync fn main() -\u003e Result\u003c(), Box\u003cdyn std::error::Error\u003e\u003e {\n    let engine_config = EngineConfig::default();\n    let engine = Engine::new(engine_config).await?;\n\n    let model = engine.model(\"Qwen/Qwen3-0.6B\".to_string()).await?.ok_or(\"Model not found\")?;\n    let downloader = engine.download(\u0026model).await?;\n    while let Some(update) = downloader.next().await {\n        println!(\"Download progress: {}\", update.progress());\n    }\n\n    let session = engine.chat(model, ChatConfig::default()).await?;\n\n    let messages = vec![\n        ChatMessage::system().with_text(\"You are a helpful assistant\".to_string()),\n        ChatMessage::user().with_text(\"Tell me a short, funny story about a robot\".to_string()),\n    ];\n\n    let replies = session.reply(messages, ChatReplyConfig::default()).await?;\n    if let Some(reply) = replies.last() {\n        println!(\"Reasoning: {}\", reply.message.reasoning().unwrap_or_default());\n        println!(\"Text: {}\", reply.message.text().unwrap_or_default());\n    }\n\n    Ok(())\n}\n```\n\n\u003c/details\u003e\n\n\n\n\u003cdetails\u003e\n\u003csummary\u003ePython\u003c/summary\u003e\n\u003cbr\u003e\n\nAdd the dependency:\n\n```bash\nuv add uzu==0.5.12\n```\n\nRun the code below:\n\n```python\nimport asyncio\n\nfrom uzu import ChatConfig, ChatMessage, ChatReplyConfig, Engine, EngineConfig\n\n\nasync def main() -\u003e None:\n    engine_config = EngineConfig.create()\n    engine = await Engine.create(engine_config)\n\n    model = await engine.model(\"Qwen/Qwen3-0.6B\")\n    if model is None:\n        return\n\n    async for update in (await engine.download(model)).iterator():\n        print(f\"Download progress: {update.progress}\")\n\n    session = await engine.chat(model, ChatConfig.create())\n\n    messages = [\n        ChatMessage.system().with_text(\"You are a helpful assistant\"),\n        ChatMessage.user().with_text(\"Tell me a short, funny story about a robot\"),\n    ]\n\n    replies = await session.reply(messages, ChatReplyConfig.create())\n    if not replies:\n        return\n\n    message = replies[-1].message\n    print(f\"Reasoning: {message.reasoning}\")\n    print(f\"Text: {message.text}\")\n\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\n\u003c/details\u003e\n\n\n\n\u003cdetails\u003e\n\u003csummary\u003eSwift\u003c/summary\u003e\n\u003cbr\u003e\n\nAdd the dependency:\n\n```swift\ndependencies: [\n    .package(url: \"https://github.com/trymirai/uzu.git\", from: \"0.5.12\")\n]\n```\n\nRun the code below:\n\n```swift\nimport Uzu\n\npublic func runQuickStart() async throws {\n    let engineConfig = EngineConfig.create()\n    let engine = try await Engine.create(config: engineConfig)\n    \n    guard let model = try await engine.model(identifier: \"Qwen/Qwen3-0.6B\") else {\n        return\n    }\n    \n    for try await update in try await engine.download(model: model).iterator() {\n        print(\"Download progress: \\(update.progress())\")\n    }\n    \n    let session = try await engine.chat(model: model, config: .create())\n    \n    let messages = [\n        ChatMessage.system().withText(text: \"You are a helpful assistant\"),\n        ChatMessage.user().withText(text: \"Tell me a short, funny story about a robot\")\n    ]\n    \n    let reply = try await session.reply(input: messages, config: .create())\n    guard let message = reply.last?.message else {\n        return\n    }\n    \n    print(\"Reasoning: \\(message.reasoning() ?? \"empty\")\")\n    print(\"Text: \\(message.text() ?? \"empty\")\")\n}\n```\n\n\u003c/details\u003e\n\n\n\n\u003cdetails\u003e\n\u003csummary\u003eTypeScript\u003c/summary\u003e\n\u003cbr\u003e\n\nAdd the dependency:\n\n```bash\npnpm add @trymirai/uzu@0.5.12\n```\n\nRun the code below:\n\n```ts\nimport { ChatConfig, ChatMessage, ChatReplyConfig, Engine, EngineConfig } from '@trymirai/uzu';\n\nasync function main() {\n    let engineConfig = EngineConfig.create();\n    let engine = await Engine.create(engineConfig);\n\n    let model = await engine.model('Qwen/Qwen3-0.6B');\n    if (!model) {\n        throw new Error('Model not found');\n    }\n\n    for await (const update of await engine.download(model)) {\n        console.log('Download progress:', update.progress);\n    }\n\n    let session = await engine.chat(model, ChatConfig.create());\n\n    let messages = [\n        ChatMessage.system().withText('You are a helpful assistant'),\n        ChatMessage.user().withText('Tell me a short, funny story about a robot')\n    ];\n\n    let reply = await session.reply(messages, ChatReplyConfig.create());\n    let message = reply[0]?.message;\n\n    if (message) {\n        console.log('Reasoning: ', message.reasoning);\n        console.log('Text: ', message.text);\n    }\n}\n\nmain().catch((error) =\u003e {\n    console.error(error);\n});\n```\n\n\u003c/details\u003e\n\n\n\u003cbr\u003e\n\nEverything from model downloading to inference configuration is handled automatically. Refer to the [documentation](https://docs.trymirai.com) for details on how to customize each step of the process.\n\n## Examples\n\nYou can run any example via `cargo tools example` \\\u003c**rust** | **python** | **swift** | **typescript**\\\u003e \\\u003c**chat** | **chat-cloud** | **chat-speculation-classification** | **chat-speculation-summarization** | **chat-structured-output** | **classification** | **quick-start** | **text-to-speech**\\\u003e:\n\n### Chat\n\nIn this example, we will download a model and get a reply to a specific list of messages:\n\n\u003cdetails\u003e\n\u003csummary\u003eRust\u003c/summary\u003e\n\n```rust\nuse uzu::{\n    engine::{Engine, EngineConfig},\n    session::chat::ChatSessionStreamChunk,\n    types::session::chat::{ChatConfig, ChatMessage, ChatReplyConfig},\n};\n\n#[tokio::main]\nasync fn main() -\u003e Result\u003c(), Box\u003cdyn std::error::Error\u003e\u003e {\n    let engine_config = EngineConfig::default();\n    let engine = Engine::new(engine_config).await?;\n\n    let model = engine.model(\"Qwen/Qwen3-0.6B\".to_string()).await?.ok_or(\"Model not found\")?;\n    let downloader = engine.download(\u0026model).await?;\n    while let Some(update) = downloader.next().await {\n        println!(\"Download progress: {}\", update.progress());\n    }\n\n    let messages = vec![\n        ChatMessage::system().with_text(\"You are a helpful assistant\".to_string()),\n        ChatMessage::user().with_text(\"Tell me a short, funny story about a robot\".to_string()),\n    ];\n    let session = engine.chat(model, ChatConfig::default()).await?;\n    let stream = session.reply_with_stream(messages, ChatReplyConfig::default()).await;\n    let mut last_message: Option\u003cChatMessage\u003e = None;\n    while let Some(chunk) = stream.next().await {\n        match chunk {\n            ChatSessionStreamChunk::Replies {\n                replies,\n            } =\u003e {\n                if let Some(reply) = replies.first() {\n                    last_message = Some(reply.message.clone());\n                    println!(\"Generated tokens: {}\", reply.stats.tokens_count_output.unwrap_or_default());\n                }\n            },\n            ChatSessionStreamChunk::Error {\n                error,\n            } =\u003e {\n                println!(\"Error: {error}\");\n            },\n        }\n    }\n    if let Some(message) = last_message {\n        println!(\"Reasoning: {}\", message.reasoning().unwrap_or_default());\n        println!(\"Text: {}\", message.text().unwrap_or_default());\n    }\n\n    Ok(())\n}\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003ePython\u003c/summary\u003e\n\n```python\nimport asyncio\n\nfrom uzu import (\n    ChatConfig,\n    ChatMessage,\n    ChatReplyConfig,\n    ChatSessionStreamChunk,\n    Engine,\n    EngineConfig,\n)\n\n\nasync def main() -\u003e None:\n    engine_config = EngineConfig.create()\n    engine = await Engine.create(engine_config)\n\n    model = await engine.model(\"Qwen/Qwen3-0.6B\")\n    if model is None:\n        raise RuntimeError(\"Model not found\")\n    async for update in (await engine.download(model)).iterator():\n        print(f\"Download progress: {update.progress}\")\n\n    messages = [\n        ChatMessage.system().with_text(\"You are a helpful assistant\"),\n        ChatMessage.user().with_text(\"Tell me a short, funny story about a robot\"),\n    ]\n    session = await engine.chat(model, ChatConfig.create())\n    stream = await session.reply_with_stream(messages, ChatReplyConfig.create())\n    message: ChatMessage | None = None\n    async for chunk in stream.iterator():\n        if isinstance(chunk, ChatSessionStreamChunk.Replies):\n            replies = chunk.replies\n            if replies:\n                reply = replies[0]\n                message = reply.message\n                print(f\"Generated tokens: {reply.stats.tokens_count_output}\")\n        elif isinstance(chunk, ChatSessionStreamChunk.Error):\n            print(f\"Error: {chunk.error}\")\n    if message is not None:\n        print(f\"Reasoning: {message.reasoning}\")\n        print(f\"Text: {message.text}\")\n\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eSwift\u003c/summary\u003e\n\n```swift\nimport Uzu\n\npublic func runChat() async throws {\n    let engineConfig = EngineConfig.create()\n    let engine = try await Engine.create(config: engineConfig)\n    \n    guard let model = try await engine.model(identifier: \"Qwen/Qwen3-0.6B\") else {\n        return\n    }\n    for try await update in try await engine.download(model: model).iterator() {\n        print(\"Download progress: \\(update.progress())\")\n    }\n    \n    let messages = [\n        ChatMessage.system().withText(text: \"You are a helpful assistant\"),\n        ChatMessage.user().withText(text: \"Tell me a short, funny story about a robot\")\n    ]\n    let session = try await engine.chat(model: model, config: .create())\n    let stream = await session.replyWithStream(input: messages, config: .create())\n    var message: ChatMessage? = nil\n    for try await update in stream.iterator() {\n        switch update {\n        case .replies(let replies):\n            let reply = replies.last\n            message = reply?.message\n            print(\"Generated tokens: \\(reply?.stats.tokensCountOutput ?? 0)\")\n        case .error(let error):\n            print(\"Error: \\(error)\")\n        }\n    }\n    print(\"Reasoning: \\(message?.reasoning() ?? \"empty\")\")\n    print(\"Text: \\(message?.text() ?? \"empty\")\")\n}\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eTypeScript\u003c/summary\u003e\n\n```ts\nimport { ChatConfig, ChatMessage, ChatReplyConfig, ChatSessionStreamChunkError, ChatSessionStreamChunkReplies, Engine, EngineConfig } from '@trymirai/uzu';\n\nasync function main() {\n    let engineConfig = EngineConfig.create();\n    let engine = await Engine.create(engineConfig);\n\n    let model = await engine.model('Qwen/Qwen3-0.6B');\n    if (!model) {\n        throw new Error('Model not found');\n    }\n    for await (const update of await engine.download(model)) {\n        console.log('Download progress:', update.progress);\n    }\n\n    let messages = [\n        ChatMessage.system().withText('You are a helpful assistant'),\n        ChatMessage.user().withText('Tell me a short, funny story about a robot')\n    ];\n    let session = await engine.chat(model, ChatConfig.create());\n    let stream = await session.replyWithStream(messages, ChatReplyConfig.create());\n    let message: ChatMessage | undefined;\n    for await (const chunk of stream) {\n        if (chunk instanceof ChatSessionStreamChunkReplies) {\n            message = chunk.replies[0]?.message;\n            console.log('Generated tokens: ', chunk.replies[0]?.stats.tokensCountOutput);\n        } else if (chunk instanceof ChatSessionStreamChunkError) {\n            console.error('Error: ', chunk.error);\n        }\n    }\n    console.log('Reasoning: ', message?.reasoning);\n    console.log('Text: ', message?.text);\n}\n\nmain().catch((error) =\u003e {\n    console.error(error);\n});\n```\n\n\u003c/details\u003e\n\n\n\u003cbr\u003eOnce loaded, the same `ChatSession` can be reused for multiple requests until you drop it. Each model may consume a significant amount of RAM, so it's important to keep only one session loaded at a time. For iOS apps, we recommend adding the [Increased Memory Capability](https://developer.apple.com/documentation/bundleresources/entitlements/com.apple.developer.kernel.increased-memory-limit) entitlement to ensure your app can allocate the required memory.\n\n### Chat with the cloud model\n\nIn this example, we will get a reply to a specific list of messages from a cloud model:\n\n\u003cdetails\u003e\n\u003csummary\u003eRust\u003c/summary\u003e\n\n```rust\nuse uzu::{\n    engine::{Engine, EngineConfig},\n    types::{\n        basic::ReasoningEffort,\n        session::chat::{ChatConfig, ChatMessage, ChatReplyConfig},\n    },\n};\n\n#[tokio::main]\nasync fn main() -\u003e Result\u003c(), Box\u003cdyn std::error::Error\u003e\u003e {\n    let engine_config = EngineConfig::default().with_openai_api_key(\"OPENAI_API_KEY\".to_string());\n    let engine = Engine::new(engine_config).await?;\n\n    let model = engine.model(\"gpt-5\".to_string()).await?.ok_or(\"Model not found\")?;\n\n    let messages = vec![\n        ChatMessage::system().with_reasoning_effort(ReasoningEffort::Low),\n        ChatMessage::user().with_text(\"How LLMs work\".to_string()),\n    ];\n\n    let session = engine.chat(model, ChatConfig::default()).await?;\n    let replies = session.reply(messages, ChatReplyConfig::default()).await?;\n    if let Some(reply) = replies.first() {\n        println!(\"Reasoning: {}\", reply.message.reasoning().unwrap_or_default());\n        println!(\"Text: {}\", reply.message.text().unwrap_or_default());\n    }\n\n    Ok(())\n}\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003ePython\u003c/summary\u003e\n\n```python\nimport asyncio\n\nfrom uzu import ChatConfig, ChatMessage, ChatReplyConfig, Engine, EngineConfig, ReasoningEffort\n\n\nasync def main() -\u003e None:\n    engine_config = EngineConfig.create().with_openai_api_key(\"OPENAI_API_KEY\")\n    engine = await Engine.create(engine_config)\n\n    model = await engine.model(\"gpt-5\")\n    if model is None:\n        raise RuntimeError(\"Model not found\")\n\n    messages = [\n        ChatMessage.system().with_reasoning_effort(ReasoningEffort.Low),\n        ChatMessage.user().with_text(\"How LLMs work\"),\n    ]\n\n    session = await engine.chat(model, ChatConfig.create())\n    replies = await session.reply(messages, ChatReplyConfig.create())\n    if replies:\n        message = replies[0].message\n        print(f\"Reasoning: {message.reasoning}\")\n        print(f\"Text: {message.text}\")\n\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eSwift\u003c/summary\u003e\n\n```swift\nimport Uzu\n\npublic func runChatCloud() async throws {\n    let engineConfig = EngineConfig.create().withOpenaiApiKey(openaiApiKey: \"OPENAI_API_KEY\")\n    let engine = try await Engine.create(config: engineConfig)\n    \n    guard let model = try await engine.model(identifier: \"Qwen/Qwen3-0.6B\") else {\n        return\n    }\n    \n    let messages = [\n        ChatMessage.system().withReasoningEffort(reasoningEffort: .low),\n        ChatMessage.user().withText(text: \"How LLMs work\")\n    ]\n    \n    let session = try await engine.chat(model: model, config: .create())\n    let reply = try await session.reply(input: messages, config: .create())\n    guard let message = reply.last?.message else {\n        return\n    }\n    \n    print(\"Reasoning: \\(message.reasoning() ?? \"empty\")\")\n    print(\"Text: \\(message.text() ?? \"empty\")\")\n}\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eTypeScript\u003c/summary\u003e\n\n```ts\nimport { ChatConfig, ChatMessage, ChatReplyConfig, Engine, EngineConfig, ReasoningEffort } from '@trymirai/uzu';\n\nasync function main() {\n    let engineConfig = EngineConfig.create().withOpenaiApiKey('OPENAI_API_KEY');\n    let engine = await Engine.create(engineConfig);\n\n    let model = await engine.model('gpt-5');\n    if (!model) {\n        throw new Error('Model not found');\n    }\n\n    let messages = [\n        ChatMessage.system().withReasoningEffort(\"Low\" as ReasoningEffort),\n        ChatMessage.user().withText('How LLMs work')\n    ];\n\n    let session = await engine.chat(model, ChatConfig.create());\n    let reply = await session.reply(messages, ChatReplyConfig.create());\n    let message = reply[0]?.message;\n    if (message) {\n        console.log('Reasoning: ', message.reasoning);\n        console.log('Text: ', message.text);\n    }\n}\n\nmain().catch((error) =\u003e {\n    console.error(error);\n});\n```\n\n\u003c/details\u003e\n\n\n### Chat using speculation preset for classification\n\nIn this example, we will use the `classification` speculation preset to determine the sentiment of the user's input:\n\n\u003cdetails\u003e\n\u003csummary\u003eRust\u003c/summary\u003e\n\n```rust\nuse uzu::{\n    engine::{Engine, EngineConfig},\n    types::{\n        basic::{Feature, ReasoningEffort, SamplingMethod},\n        session::chat::{ChatConfig, ChatMessage, ChatReplyConfig, ChatSpeculationPreset},\n    },\n};\n\n#[tokio::main]\nasync fn main() -\u003e Result\u003c(), Box\u003cdyn std::error::Error\u003e\u003e {\n    let engine_config = EngineConfig::default();\n    let engine = Engine::new(engine_config).await?;\n\n    let model = engine.model(\"Qwen/Qwen3-0.6B\".to_string()).await?.ok_or(\"Model not found\")?;\n    let downloader = engine.download(\u0026model).await?;\n    while let Some(update) = downloader.next().await {\n        println!(\"Download progress: {}\", update.progress());\n    }\n\n    let feature = Feature {\n        name: \"sentiment\".to_string(),\n        values: vec![\n            \"Happy\".to_string(),\n            \"Sad\".to_string(),\n            \"Angry\".to_string(),\n            \"Fearful\".to_string(),\n            \"Surprised\".to_string(),\n            \"Disgusted\".to_string(),\n        ],\n    };\n    let chat_config = ChatConfig::default().with_speculation_preset(Some(ChatSpeculationPreset::Classification {\n        feature: feature.clone(),\n    }));\n    let session = engine.chat(model, chat_config).await?;\n\n    let text_to_detect_feature = \"Today's been awesome! Everything just feels right, and I can't stop smiling.\";\n    let prompt = format!(\n        \"Text is: \\\"{text_to_detect_feature}\\\". Choose {} from the list: {}. Answer with one word. Don't add a dot at the end.\",\n        feature.name,\n        feature.values.join(\", \")\n    );\n    let messages = vec![\n        ChatMessage::system().with_reasoning_effort(ReasoningEffort::Disabled),\n        ChatMessage::user().with_text(prompt),\n    ];\n\n    let chat_reply_config =\n        ChatReplyConfig::default().with_token_limit(Some(32)).with_sampling_method(SamplingMethod::Greedy {});\n    let replies = session.reply(messages, chat_reply_config).await?;\n    if let Some(reply) = replies.first() {\n        println!(\"Prediction: {}\", reply.message.text().unwrap_or_default());\n        println!(\"Generated tokens: {}\", reply.stats.tokens_count_output.unwrap_or_default());\n    }\n\n    Ok(())\n}\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003ePython\u003c/summary\u003e\n\n```python\nimport asyncio\n\nfrom uzu import (\n    ChatConfig,\n    ChatMessage,\n    ChatReplyConfig,\n    ChatSpeculationPreset,\n    Engine,\n    EngineConfig,\n    Feature,\n    ReasoningEffort,\n    SamplingMethod,\n)\n\n\nasync def main() -\u003e None:\n    engine_config = EngineConfig.create()\n    engine = await Engine.create(engine_config)\n\n    model = await engine.model(\"Qwen/Qwen3-0.6B\")\n    if model is None:\n        raise RuntimeError(\"Model not found\")\n    async for update in (await engine.download(model)).iterator():\n        print(f\"Download progress: {update.progress}\")\n\n    feature = Feature(\n        \"sentiment\",\n        [\"Happy\", \"Sad\", \"Angry\", \"Fearful\", \"Surprised\", \"Disgusted\"],\n    )\n    chat_config = ChatConfig.create().with_speculation_preset(ChatSpeculationPreset.Classification(feature))\n    session = await engine.chat(model, chat_config)\n\n    text_to_detect_feature = \"Today's been awesome! Everything just feels right, and I can't stop smiling.\"\n    prompt = (\n        f'Text is: \"{text_to_detect_feature}\". '\n        f\"Choose {feature.name} from the list: {', '.join(feature.values)}. \"\n        \"Answer with one word. Don't add a dot at the end.\"\n    )\n    messages = [\n        ChatMessage.system().with_reasoning_effort(ReasoningEffort.Disabled),\n        ChatMessage.user().with_text(prompt),\n    ]\n\n    chat_reply_config = ChatReplyConfig.create().with_token_limit(32).with_sampling_method(SamplingMethod.Greedy())\n    replies = await session.reply(messages, chat_reply_config)\n    if replies:\n        reply = replies[0]\n        print(f\"Prediction: {reply.message.text}\")\n        print(f\"Generated tokens: {reply.stats.tokens_count_output}\")\n\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eSwift\u003c/summary\u003e\n\n```swift\nimport Uzu\n\npublic func runChatSpeculationClassification() async throws {\n    let engineConfig = EngineConfig.create()\n    let engine = try await Engine.create(config: engineConfig)\n    \n    guard let model = try await engine.model(identifier: \"Qwen/Qwen3-0.6B\") else {\n        return\n    }\n    for try await update in try await engine.download(model: model).iterator() {\n        print(\"Download progress: \\(update.progress())\")\n    }\n    \n    let feature = Feature(name: \"sentiment\", values: [\n        \"Happy\",\n        \"Sad\",\n        \"Angry\",\n        \"Fearful\",\n        \"Surprised\",\n        \"Disgusted\",\n    ])\n    let chatConfig = ChatConfig.create().withSpeculationPreset(speculationPreset: .classification(feature: feature))\n    let session = try await engine.chat(model: model, config: chatConfig)\n    \n    let textToDetectFeature =\n            \"Today's been awesome! Everything just feels right, and I can't stop smiling.\"\n    let prompt = \"Text is: \\\"\\(textToDetectFeature)\\\". Choose \\(feature.name) from the list: \\(feature.values.joined(separator: \", \")). Answer with one word. Don't add a dot at the end.\"\n    let messages = [\n        ChatMessage.system().withReasoningEffort(reasoningEffort: .disabled),\n        ChatMessage.user().withText(text: prompt)\n    ]\n    \n    let chatReplyConfig = ChatReplyConfig.create().withTokenLimit(tokenLimit: 32).withSamplingMethod(samplingMethod: .greedy)\n    let replies = try await session.reply(input: messages, config: chatReplyConfig)\n    guard let reply = replies.last else {\n        return\n    }\n    \n    print(\"Prediction: \\(reply.message.text() ?? \"empty\")\")\n    print(\"Generated tokens: \\(reply.stats.tokensCountOutput ?? 0)\")\n}\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eTypeScript\u003c/summary\u003e\n\n```ts\nimport { ChatConfig, ChatMessage, ChatReplyConfig, ChatSpeculationPresetClassification, Engine, EngineConfig, Feature, ReasoningEffort, SamplingMethodGreedy } from '@trymirai/uzu';\n\nasync function main() {\n    let engineConfig = EngineConfig.create();\n    let engine = await Engine.create(engineConfig);\n\n    let model = await engine.model('Qwen/Qwen3-0.6B');\n    if (!model) {\n        throw new Error('Model not found');\n    }\n    for await (const update of await engine.download(model)) {\n        console.log('Download progress:', update.progress);\n    }\n\n    const feature = new Feature('sentiment', [\n        'Happy',\n        'Sad',\n        'Angry',\n        'Fearful',\n        'Surprised',\n        'Disgusted',\n    ]);\n    let chatConfig = ChatConfig.create().withSpeculationPreset(new ChatSpeculationPresetClassification(feature));\n    let session = await engine.chat(model, chatConfig);\n\n    const textToDetectFeature =\n        \"Today's been awesome! Everything just feels right, and I can't stop smiling.\";\n    const prompt =\n        `Text is: \"${textToDetectFeature}\". Choose ${feature.name} from the list: ${feature.values.join(', ')}. ` +\n        \"Answer with one word. Don't add a dot at the end.\";\n    let messages = [\n        ChatMessage.system().withReasoningEffort(\"Disabled\" as ReasoningEffort),\n        ChatMessage.user().withText(prompt)\n    ];\n\n    let chatReplyConfig = ChatReplyConfig.create().withTokenLimit(32).withSamplingMethod(new SamplingMethodGreedy());\n    let reply = (await session.reply(messages, chatReplyConfig))[0];\n\n    if (reply) {\n        console.log('Prediction: ', reply.message.text);\n        console.log('Generated tokens: ', reply.stats.tokensCountOutput);\n    }\n}\n\nmain().catch((error) =\u003e {\n    console.error(error);\n});\n```\n\n\u003c/details\u003e\n\n\n\u003cbr\u003eYou can view the stats to see that the answer will be ready immediately after the prefill step, and actual generation won’t even start due to speculative decoding, which significantly improves generation speed.\n\n### Chat using speculation preset for summarization\n\nIn this example, we will use the `summarization` speculation preset to generate a summary of the input text:\n\n\u003cdetails\u003e\n\u003csummary\u003eRust\u003c/summary\u003e\n\n```rust\nuse uzu::{\n    engine::{Engine, EngineConfig},\n    types::{\n        basic::{ReasoningEffort, SamplingMethod},\n        session::chat::{ChatConfig, ChatMessage, ChatReplyConfig, ChatSpeculationPreset},\n    },\n};\n\n#[tokio::main]\nasync fn main() -\u003e Result\u003c(), Box\u003cdyn std::error::Error\u003e\u003e {\n    let engine_config = EngineConfig::default();\n    let engine = Engine::new(engine_config).await?;\n\n    let model = engine.model(\"Qwen/Qwen3-0.6B\".to_string()).await?.ok_or(\"Model not found\")?;\n    let downloader = engine.download(\u0026model).await?;\n    while let Some(update) = downloader.next().await {\n        println!(\"Download progress: {}\", update.progress());\n    }\n\n    let text_to_summarize = \"A Large Language Model (LLM) is a type of artificial intelligence that processes and generates human-like text. \\\n        It is trained on vast datasets containing books, articles, and web content, allowing it to understand and predict language patterns. \\\n        LLMs use deep learning, particularly transformer-based architectures, to analyze text, recognize context, and generate coherent responses. \\\n        These models have a wide range of applications, including chatbots, content creation, translation, and code generation. \\\n        One of the key strengths of LLMs is their ability to generate contextually relevant text based on prompts. \\\n        They utilize self-attention mechanisms to weigh the importance of words within a sentence, improving accuracy and fluency. \\\n        Examples of popular LLMs include OpenAI's GPT series, Google's BERT, and Meta's LLaMA. \\\n        As these models grow in size and sophistication, they continue to enhance human-computer interactions, \\\n        making AI-powered communication more natural and effective.\";\n    let prompt = format!(\"Text is: \\\"{text_to_summarize}\\\". Write only summary itself.\");\n    let messages = vec![\n        ChatMessage::system().with_reasoning_effort(ReasoningEffort::Disabled),\n        ChatMessage::user().with_text(prompt),\n    ];\n\n    let chat_config = ChatConfig::default().with_speculation_preset(Some(ChatSpeculationPreset::Summarization {}));\n    let session = engine.chat(model, chat_config).await?;\n\n    let chat_reply_config =\n        ChatReplyConfig::default().with_token_limit(Some(256)).with_sampling_method(SamplingMethod::Greedy {});\n    let replies = session.reply(messages, chat_reply_config).await?;\n    if let Some(reply) = replies.first() {\n        println!(\"Summary: {}\", reply.message.text().unwrap_or_default());\n        println!(\"Generation t/s: {}\", reply.stats.generate_tokens_per_second.unwrap_or_default());\n    }\n\n    Ok(())\n}\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003ePython\u003c/summary\u003e\n\n```python\nimport asyncio\n\nfrom uzu import (\n    ChatConfig,\n    ChatMessage,\n    ChatReplyConfig,\n    ChatSpeculationPreset,\n    Engine,\n    EngineConfig,\n    ReasoningEffort,\n    SamplingMethod,\n)\n\n\nasync def main() -\u003e None:\n    engine_config = EngineConfig.create()\n    engine = await Engine.create(engine_config)\n\n    model = await engine.model(\"Qwen/Qwen3-0.6B\")\n    if model is None:\n        raise RuntimeError(\"Model not found\")\n    async for update in (await engine.download(model)).iterator():\n        print(f\"Download progress: {update.progress}\")\n\n    text_to_summarize = (\n        \"A Large Language Model (LLM) is a type of artificial intelligence that processes and generates human-like text. \"\n        \"It is trained on vast datasets containing books, articles, and web content, allowing it to understand and predict language patterns. \"\n        \"LLMs use deep learning, particularly transformer-based architectures, to analyze text, recognize context, and generate coherent responses. \"\n        \"These models have a wide range of applications, including chatbots, content creation, translation, and code generation. \"\n        \"One of the key strengths of LLMs is their ability to generate contextually relevant text based on prompts. \"\n        \"They utilize self-attention mechanisms to weigh the importance of words within a sentence, improving accuracy and fluency. \"\n        \"Examples of popular LLMs include OpenAI's GPT series, Google's BERT, and Meta's LLaMA. \"\n        \"As these models grow in size and sophistication, they continue to enhance human-computer interactions, \"\n        \"making AI-powered communication more natural and effective.\"\n    )\n    prompt = f'Text is: \"{text_to_summarize}\". Write only summary itself.'\n    messages = [\n        ChatMessage.system().with_reasoning_effort(ReasoningEffort.Disabled),\n        ChatMessage.user().with_text(prompt),\n    ]\n\n    chat_config = ChatConfig.create().with_speculation_preset(ChatSpeculationPreset.Summarization())\n    session = await engine.chat(model, chat_config)\n\n    chat_reply_config = ChatReplyConfig.create().with_token_limit(256).with_sampling_method(SamplingMethod.Greedy())\n    replies = await session.reply(messages, chat_reply_config)\n    if replies:\n        reply = replies[0]\n        print(f\"Summary: {reply.message.text}\")\n        print(f\"Generation t/s: {reply.stats.generate_tokens_per_second}\")\n\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eSwift\u003c/summary\u003e\n\n```swift\nimport Uzu\n\npublic func runChatSpeculationSummarization() async throws {\n    let engineConfig = EngineConfig.create()\n    let engine = try await Engine.create(config: engineConfig)\n    \n    guard let model = try await engine.model(identifier: \"Qwen/Qwen3-0.6B\") else {\n        return\n    }\n    for try await update in try await engine.download(model: model).iterator() {\n        print(\"Download progress: \\(update.progress())\")\n    }\n    \n    let textToSummarize = \"A Large Language Model (LLM) is a type of artificial intelligence that processes and generates human-like text. It is trained on vast datasets containing books, articles, and web content, allowing it to understand and predict language patterns. LLMs use deep learning, particularly transformer-based architectures, to analyze text, recognize context, and generate coherent responses. These models have a wide range of applications, including chatbots, content creation, translation, and code generation. One of the key strengths of LLMs is their ability to generate contextually relevant text based on prompts. They utilize self-attention mechanisms to weigh the importance of words within a sentence, improving accuracy and fluency. Examples of popular LLMs include OpenAI's GPT series, Google's BERT, and Meta's LLaMA. As these models grow in size and sophistication, they continue to enhance human-computer interactions, making AI-powered communication more natural and effective.\";\n    let prompt = \"Text is: \\\"\\(textToSummarize)\\\". Write only summary itself.\"\n    let messages = [\n        ChatMessage.system().withReasoningEffort(reasoningEffort: .disabled),\n        ChatMessage.user().withText(text: prompt)\n    ]\n    \n    let chatConfig = ChatConfig.create().withSpeculationPreset(speculationPreset: .summarization)\n    let session = try await engine.chat(model: model, config: chatConfig)\n    \n    let chatReplyConfig = ChatReplyConfig.create().withTokenLimit(tokenLimit: 256).withSamplingMethod(samplingMethod: .greedy)\n    let replies = try await session.reply(input: messages, config: chatReplyConfig)\n    guard let reply = replies.last else {\n        return\n    }\n    \n    print(\"Summary: \\(reply.message.text() ?? \"empty\")\")\n    print(\"Generation t\\\\s: \\(reply.stats.generateTokensPerSecond ?? 0.0)\")\n}\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eTypeScript\u003c/summary\u003e\n\n```ts\nimport { ChatConfig, ChatMessage, ChatReplyConfig, ChatSpeculationPresetSummarization, Engine, EngineConfig, ReasoningEffort, SamplingMethodGreedy } from '@trymirai/uzu';\n\nasync function main() {\n    let engineConfig = EngineConfig.create();\n    let engine = await Engine.create(engineConfig);\n\n    let model = await engine.model('Qwen/Qwen3-0.6B');\n    if (!model) {\n        throw new Error('Model not found');\n    }\n    for await (const update of await engine.download(model)) {\n        console.log('Download progress:', update.progress);\n    }\n\n    const textToSummarize =\n        \"A Large Language Model (LLM) is a type of artificial intelligence that processes and generates human-like text. It is trained on vast datasets containing books, articles, and web content, allowing it to understand and predict language patterns. LLMs use deep learning, particularly transformer-based architectures, to analyze text, recognize context, and generate coherent responses. These models have a wide range of applications, including chatbots, content creation, translation, and code generation. One of the key strengths of LLMs is their ability to generate contextually relevant text based on prompts. They utilize self-attention mechanisms to weigh the importance of words within a sentence, improving accuracy and fluency. Examples of popular LLMs include OpenAI's GPT series, Google's BERT, and Meta's LLaMA. As these models grow in size and sophistication, they continue to enhance human-computer interactions, making AI-powered communication more natural and effective.\";\n    const prompt = `Text is: \"${textToSummarize}\". Write only summary itself.`;\n    let messages = [\n        ChatMessage.system().withReasoningEffort(\"Disabled\" as ReasoningEffort),\n        ChatMessage.user().withText(prompt)\n    ];\n\n    let chatConfig = ChatConfig.create().withSpeculationPreset(new ChatSpeculationPresetSummarization);\n    let session = await engine.chat(model, chatConfig);\n\n    let chatReplyConfig = ChatReplyConfig.create().withTokenLimit(256).withSamplingMethod(new SamplingMethodGreedy());\n    let reply = (await session.reply(messages, chatReplyConfig))[0];\n\n    if (reply) {\n        console.log('Summary: ', reply.message.text);\n        console.log('Generation t\\\\s: ', reply.stats.generateTokensPerSecond);\n    }\n}\n\nmain().catch((error) =\u003e {\n    console.error(error);\n});\n```\n\n\u003c/details\u003e\n\n\n\u003cbr\u003eYou will notice that the model’s run count is lower than the actual number of generated tokens due to speculative decoding, which significantly improves generation speed.\n\n### Chat with structured output\n\nSometimes you want the generated output to be valid JSON with predefined fields. You can use `Grammar` to manually specify a JSON schema for the response you want to receive:\n\n\u003cdetails\u003e\n\u003csummary\u003eRust\u003c/summary\u003e\n\n```rust\nuse schemars::{JsonSchema, schema_for};\nuse serde::{Deserialize, Serialize};\nuse uzu::{\n    engine::{Engine, EngineConfig},\n    types::{\n        basic::{Grammar, ReasoningEffort},\n        session::chat::{ChatConfig, ChatMessage, ChatReplyConfig},\n    },\n};\n\n#[derive(Debug, Serialize, Deserialize, JsonSchema)]\nstruct Country {\n    name: String,\n    capital: String,\n}\n\n#[derive(Debug, Serialize, Deserialize, JsonSchema)]\nstruct CountryList {\n    countries: Vec\u003cCountry\u003e,\n}\n\n#[tokio::main]\nasync fn main() -\u003e Result\u003c(), Box\u003cdyn std::error::Error\u003e\u003e {\n    let engine_config = EngineConfig::default();\n    let engine = Engine::new(engine_config).await?;\n\n    let model = engine.model(\"Qwen/Qwen3-0.6B\".to_string()).await?.ok_or(\"Model not found\")?;\n    let downloader = engine.download(\u0026model).await?;\n    while let Some(update) = downloader.next().await {\n        println!(\"Download progress: {}\", update.progress());\n    }\n\n    let schema_string = serde_json::to_string(\u0026schema_for!(CountryList))?;\n    let messages = vec![\n        ChatMessage::system().with_reasoning_effort(ReasoningEffort::Disabled),\n        ChatMessage::user().with_text(\n            \"Give me a JSON object containing a list of 3 countries, where each country has name and capital fields\"\n                .to_string(),\n        ),\n    ];\n\n    let session = engine.chat(model, ChatConfig::default()).await?;\n    let chat_reply_config = ChatReplyConfig::default().with_grammar(Some(Grammar::JsonSchema {\n        schema: schema_string,\n    }));\n    let replies = session.reply(messages, chat_reply_config).await?;\n    if let Some(reply) = replies.first()\n        \u0026\u0026 let Some(text) = reply.message.text()\n    {\n        let parsed: CountryList = serde_json::from_str(\u0026text)?;\n        println!(\"{parsed:#?}\");\n    }\n\n    Ok(())\n}\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003ePython\u003c/summary\u003e\n\n```python\nimport asyncio\nimport json\n\nfrom pydantic import BaseModel\n\nfrom uzu import (\n    ChatConfig,\n    ChatMessage,\n    ChatReplyConfig,\n    Engine,\n    EngineConfig,\n    Grammar,\n    ReasoningEffort,\n)\n\n\nclass Country(BaseModel):\n    name: str\n    capital: str\n\n\nclass CountryList(BaseModel):\n    countries: list[Country]\n\n\ndef structured_response(response: str | None, model_type: type[BaseModel]) -\u003e BaseModel | None:\n    if not response:\n        return None\n    return model_type.model_validate_json(response)\n\n\nasync def main() -\u003e None:\n    engine_config = EngineConfig.create()\n    engine = await Engine.create(engine_config)\n\n    model = await engine.model(\"Qwen/Qwen3-0.6B\")\n    if model is None:\n        raise RuntimeError(\"Model not found\")\n    async for update in (await engine.download(model)).iterator():\n        print(f\"Download progress: {update.progress}\")\n\n    schema_string = json.dumps(CountryList.model_json_schema())\n    messages = [\n        ChatMessage.system().with_reasoning_effort(ReasoningEffort.Disabled),\n        ChatMessage.user().with_text(\n            \"Give me a JSON object containing a list of 3 countries, where each country has name and capital fields\"\n        ),\n    ]\n\n    session = await engine.chat(model, ChatConfig.create())\n    replies = await session.reply(\n        messages,\n        ChatReplyConfig.create().with_grammar(Grammar.JsonSchema(schema_string)),\n    )\n    if replies:\n        countries = structured_response(replies[0].message.text, CountryList)\n        print(countries)\n\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eSwift\u003c/summary\u003e\n\n```swift\nimport FoundationModels\nimport Uzu\n\n@Generable()\nstruct Country: Codable {\n    let name: String\n    let capital: String\n}\n\npublic func runChatStructuredOutput() async throws {\n    let engineConfig = EngineConfig.create()\n    let engine = try await Engine.create(config: engineConfig)\n    \n    guard let model = try await engine.model(identifier: \"Qwen/Qwen3-0.6B\") else {\n        return\n    }\n    for try await update in try await engine.download(model: model).iterator() {\n        print(\"Download progress: \\(update.progress())\")\n    }\n    \n    let messages = [\n        ChatMessage.system().withReasoningEffort(reasoningEffort: .disabled),\n        ChatMessage.user().withText(text: \"Give me a JSON object containing a list of 3 countries, where each country has name and capital fields\")\n    ]\n    \n    let session = try await engine.chat(model: model, config: .create())\n    let reply = try await session.reply(input: messages, config: .create().withGrammar(grammar: .fromType([Country].self)))\n    guard let message = reply.last?.message else {\n        return\n    }\n    guard let countries: [Country] = message.textDecoded() else {\n        return\n    }\n    print(countries)\n}\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eTypeScript\u003c/summary\u003e\n\n```ts\nimport { ChatConfig, ChatMessage, ChatReplyConfig, Engine, EngineConfig, GrammarJsonSchema, ReasoningEffort } from '@trymirai/uzu';\nimport * as z from \"zod\";\n\nconst CountryType = z.object({\n    name: z.string(),\n    capital: z.string(),\n});\nconst CountryListType = z.array(CountryType);\n\nfunction structuredResponse\u003cT extends z.ZodType\u003e(response: string | null | undefined, type: T): z.infer\u003cT\u003e | undefined {\n    if (!response) {\n        return undefined;\n    }\n    const data = JSON.parse(response);\n    const result = type.parse(data);\n    return result;\n}\n\nasync function main() {\n    let engineConfig = EngineConfig.create();\n    let engine = await Engine.create(engineConfig);\n\n    let model = await engine.model('Qwen/Qwen3-0.6B');\n    if (!model) {\n        throw new Error('Model not found');\n    }\n    for await (const update of await engine.download(model)) {\n        console.log('Download progress:', update.progress);\n    }\n\n    let schema = z.toJSONSchema(CountryListType);\n    let schemaString = JSON.stringify(schema);\n    let messages = [\n        ChatMessage.system().withReasoningEffort(\"Disabled\" as ReasoningEffort),\n        ChatMessage.user().withText('Give me a JSON object containing a list of 3 countries, where each country has name and capital fields')\n    ];\n\n    let session = await engine.chat(model, ChatConfig.create());\n    let reply = await session.reply(messages, ChatReplyConfig.create().withGrammar(new GrammarJsonSchema(schemaString)));\n    let message = reply[0]?.message;\n    let countries = structuredResponse(message?.text, CountryListType);\n    console.log(countries);\n}\n\nmain().catch((error) =\u003e {\n    console.error(error);\n});\n```\n\n\u003c/details\u003e\n\n\n### Classification\n\nIn this example, we will use a classification model to determine whether the user's input is safe from a moderation perspective:\n\n\u003cdetails\u003e\n\u003csummary\u003eRust\u003c/summary\u003e\n\n```rust\nuse uzu::{\n    engine::{Engine, EngineConfig},\n    types::session::classification::ClassificationMessage,\n};\n\n#[tokio::main]\nasync fn main() -\u003e Result\u003c(), Box\u003cdyn std::error::Error\u003e\u003e {\n    let engine_config = EngineConfig::default();\n    let engine = Engine::new(engine_config).await?;\n\n    let model = engine.model(\"trymirai/chat-moderation-router\".to_string()).await?.ok_or(\"Model not found\")?;\n    let downloader = engine.download(\u0026model).await?;\n    while let Some(update) = downloader.next().await {\n        println!(\"Download progress: {}\", update.progress());\n    }\n\n    let messages = vec![ClassificationMessage::user(\"Hi\".to_string())];\n\n    let session = engine.classification(model).await?;\n    let output = session.classify(messages).await?;\n    println!(\"Output: {:?}\", output.probabilities.values);\n\n    Ok(())\n}\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003ePython\u003c/summary\u003e\n\n```python\nimport asyncio\n\nfrom uzu import ClassificationMessage, Engine, EngineConfig\n\n\nasync def main() -\u003e None:\n    engine_config = EngineConfig.create()\n    engine = await Engine.create(engine_config)\n\n    model = await engine.model(\"trymirai/chat-moderation-router\")\n    if model is None:\n        raise RuntimeError(\"Model not found\")\n    async for update in (await engine.download(model)).iterator():\n        print(f\"Download progress: {update.progress}\")\n\n    messages = [ClassificationMessage.user(\"Hi\")]\n\n    session = await engine.classification(model)\n    output = await session.classify(messages)\n    print(f\"Output: {output.probabilities.values}\")\n\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eSwift\u003c/summary\u003e\n\n```swift\nimport Uzu\n\npublic func runClassification() async throws {\n    let engine = try await Engine.create(config: .create())\n    \n    guard let model = try await engine.model(identifier: \"trymirai/chat-moderation-router\") else {\n        return\n    }\n    for try await update in try await engine.download(model: model).iterator() {\n        print(\"Download progress: \\(update.progress())\")\n    }\n    \n    let messages = [\n        ClassificationMessage.user(content: \"Hi\")\n    ]\n    \n    let session = try await engine.classification(model: model)\n    let output = try await session.classify(input: messages)\n    print(\"Output: \\(output.probabilities.values)\")\n}\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eTypeScript\u003c/summary\u003e\n\n```ts\nimport { ClassificationMessage, Engine, EngineConfig } from '@trymirai/uzu';\n\nasync function main() {\n    let engineConfig = EngineConfig.create();\n    let engine = await Engine.create(engineConfig);\n\n    let model = await engine.model('trymirai/chat-moderation-router');\n    if (!model) {\n        throw new Error('Model not found');\n    }\n    for await (const update of await engine.download(model)) {\n        console.log('Download progress:', update.progress);\n    }\n\n    let messages = [\n        ClassificationMessage.user('Hi')\n    ];\n\n    let session = await engine.classification(model);\n    let output = await session.classify(messages);\n    console.log('Output: ', output.probabilities.values);\n}\n\nmain().catch((error) =\u003e {\n    console.error(error);\n});\n```\n\n\u003c/details\u003e\n\n\n### Text to Speech\n\nIn this example, we will generate audio from text:\n\n\u003cdetails\u003e\n\u003csummary\u003eRust\u003c/summary\u003e\n\n```rust\nuse uzu::{\n    engine::{Engine, EngineConfig},\n    session::text_to_speech::TextToSpeechSessionStreamChunk,\n    types::basic::PcmBatch,\n};\n\n#[tokio::main]\nasync fn main() -\u003e Result\u003c(), Box\u003cdyn std::error::Error\u003e\u003e {\n    let engine_config = EngineConfig::default();\n    let engine = Engine::new(engine_config).await?;\n\n    let model = engine.model(\"fishaudio/s1-mini\".to_string()).await?.ok_or(\"Model not found\")?;\n    let downloader = engine.download(\u0026model).await?;\n    while let Some(update) = downloader.next().await {\n        println!(\"Download progress: {}\", update.progress());\n    }\n\n    let text = \"London is the capital of United Kingdom and one of the world's most influential cities, \\\n        known for its rich history, cultural diversity, and global significance in finance, politics, and the arts. \\\n        Situated along the River Thames, the city blends historic landmarks like Tower of London and Buckingham Palace \\\n        with modern architecture such as The Shard. London is also home to renowned institutions including the British Museum \\\n        and vibrant areas like Covent Garden, offering a mix of history, entertainment, and innovation that attracts millions of visitors each year.\";\n    let output_path = dirs::home_dir().ok_or(\"Home not found\")?.join(\"Desktop\").join(\"output.wav\");\n\n    let session = engine.text_to_speech(model).await?;\n    let stream = session.synthesize_stream(text.to_string()).await;\n    let mut pcm_batches: Vec\u003cPcmBatch\u003e = Vec::new();\n    while let Some(event) = stream.next().await {\n        match event {\n            TextToSpeechSessionStreamChunk::Output {\n                output,\n            } =\u003e {\n                pcm_batches.push(output.pcm_batch);\n            },\n            TextToSpeechSessionStreamChunk::Error {\n                error,\n            } =\u003e {\n                println!(\"Error: {error}\");\n            },\n        }\n    }\n\n    let pcm_batch_first = pcm_batches.first().ok_or(\"No batches\")?;\n    let pcm_batch_full = PcmBatch {\n        samples: pcm_batches.iter().flat_map(|batch| batch.samples.iter().copied()).collect(),\n        sample_rate: pcm_batch_first.sample_rate,\n        channels: pcm_batch_first.channels,\n        lengths: vec![pcm_batches.iter().flat_map(|batch| batch.lengths.iter().copied()).sum()],\n    };\n    pcm_batch_full.save_as_wav(output_path.to_string_lossy().to_string())?;\n    println!(\"Output saved to: {}\", output_path.display());\n\n    Ok(())\n}\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003ePython\u003c/summary\u003e\n\n```python\nimport asyncio\nfrom pathlib import Path\n\nfrom uzu import Engine, EngineConfig\n\n\nasync def main() -\u003e None:\n    engine_config = EngineConfig.create()\n    engine = await Engine.create(engine_config)\n\n    model = await engine.model(\"fishaudio/s1-mini\")\n    if model is None:\n        raise RuntimeError(\"Model not found\")\n    async for update in (await engine.download(model)).iterator():\n        print(f\"Download progress: {update.progress}\")\n\n    text = (\n        \"London is the capital of United Kingdom and one of the world's most influential cities, \"\n        \"known for its rich history, cultural diversity, and global significance in finance, politics, and the arts. \"\n        \"Situated along the River Thames, the city blends historic landmarks like Tower of London and Buckingham Palace \"\n        \"with modern architecture such as The Shard. London is also home to renowned institutions including the British Museum \"\n        \"and vibrant areas like Covent Garden, offering a mix of history, entertainment, and innovation that attracts millions of visitors each year.\"\n    )\n    output_path = Path.home() / \"Desktop\" / \"output.wav\"\n    session = await engine.text_to_speech(model)\n    output = await session.synthesize(text)\n    output.pcm_batch.save_as_wav(str(output_path))\n    print(f\"Output saved to: {output_path}\")\n\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eSwift\u003c/summary\u003e\n\n```swift\nimport Foundation\nimport Uzu\n\npublic func runTextToSpeech() async throws {\n    let engineConfig = EngineConfig.create()\n    let engine = try await Engine.create(config: engineConfig)\n    \n    guard let model = try await engine.model(identifier: \"fishaudio/s1-mini\") else {\n        return\n    }\n    for try await update in try await engine.download(model: model).iterator() {\n        print(\"Download progress: \\(update.progress())\")\n    }\n    \n    let text = \"London is the capital of United Kingdom and one of the world’s most influential cities, known for its rich history, cultural diversity, and global significance in finance, politics, and the arts. Situated along the River Thames, the city blends historic landmarks like Tower of London and Buckingham Palace with modern architecture such as The Shard. London is also home to renowned institutions including the British Museum and vibrant areas like Covent Garden, offering a mix of history, entertainment, and innovation that attracts millions of visitors each year.\"\n    let outputPath = FileManager.default.homeDirectoryForCurrentUser\n        .appendingPathComponent(\"Desktop\")\n        .appendingPathComponent(\"output.wav\")\n    let session = try await engine.textToSpeech(model: model)\n    let output = try await session.synthesize(input: text)\n    try output.pcmBatch.saveAsWav(path: outputPath.path())\n    print(\"Output saved to: \\(outputPath.path())\")\n}\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eTypeScript\u003c/summary\u003e\n\n```ts\nimport { Engine, EngineConfig } from '@trymirai/uzu';\nimport { homedir } from \"os\";\nimport { join } from \"path\";\n\nasync function main() {\n    let engineConfig = EngineConfig.create();\n    let engine = await Engine.create(engineConfig);\n\n    let model = await engine.model('fishaudio/s1-mini');\n    if (!model) {\n        throw new Error('Model not found');\n    }\n    for await (const update of await engine.download(model)) {\n        console.log('Download progress:', update.progress);\n    }\n\n    const text = \"London is the capital of United Kingdom and one of the world’s most influential cities, known for its rich history, cultural diversity, and global significance in finance, politics, and the arts. Situated along the River Thames, the city blends historic landmarks like Tower of London and Buckingham Palace with modern architecture such as The Shard. London is also home to renowned institutions including the British Museum and vibrant areas like Covent Garden, offering a mix of history, entertainment, and innovation that attracts millions of visitors each year.\";\n    const outputPath = join(homedir(), \"Desktop\", \"output.wav\");\n    let session = await engine.textToSpeech(model);\n    let output = await session.synthesize(text);\n    output.pcmBatch.saveAsWav(outputPath);\n    console.log('Output saved to: ', outputPath);\n}\n\nmain().catch((error) =\u003e {\n    console.error(error);\n});\n```\n\n\u003c/details\u003e\n\n\n\n## Development\n\n`uzu` is a native Rust crate with bindings available for:\n\n- `Swift` via [uniffi-rs](https://github.com/mozilla/uniffi-rs)\n- `Python` via [pyo3](https://github.com/PyO3/pyo3)\n- `TypeScript` via [napi-rs](https://github.com/napi-rs/napi-rs)\n\nIt supports:\n\n- Backends:\n    - `metal`\n    - `cpu`\n- Targets:\n    - `aarch64-apple-darwin`\n    - `aarch64-apple-ios`\n    - `aarch64-apple-ios-sim`\n    - `aarch64-pc-windows-msvc` _(in progress)_\n    - `aarch64-unknown-linux-gnu` _(in progress)_\n    - `wasm32-wasip1-threads` _(in progress)_\n    - `x86_64-apple-darwin`\n    - `x86_64-pc-windows-msvc` _(in progress)_\n    - `x86_64-unknown-linux-gnu` _(in progress)_\n\n\u003cbr\u003e\nFor initial setup we recommend running \u003ccode\u003ecargo tools setup\u003c/code\u003e, which installs all necessary dependencies (\u003ccode\u003erustup\u003c/code\u003e, \u003ccode\u003euv\u003c/code\u003e, \u003ccode\u003epnpm\u003c/code\u003e, \u003ccode\u003eRust targets\u003c/code\u003e, \u003ccode\u003eMetal toolchain\u003c/code\u003e, ...) if not already present.\n\n\u003cbr\u003e\nTo unify cross-language development we introduce \u003ccode\u003ecargo tools\u003c/code\u003e:\n\n- Install language specific dependencies: `cargo tools install typescript`\n- Build: `cargo tools build rust --targets apple`\n- Test: `cargo tools test python`\n- Run example: `cargo tools example swift chat`\n\n## Model Format\n\n`uzu` uses its own model format. You can download a test model:\n\n```bash\n./scripts/download_test_model.sh\n```\n\nOr download any supported model that has already been converted:\n\n```bash\ncd ./tools/\nuv run downloader list             # show the list of supported models\nuv run downloader download {REPO}  # download a specific model\n```\n\nModels downloaded for development are stored at `./workspace/models/0.5.12/`.\n\nYou can also export a model yourself with [lalamo](https://github.com/trymirai/lalamo):\n\n```bash\ngit clone https://github.com/trymirai/lalamo.git\ncd lalamo\nuv run lalamo list-models\nuv run lalamo convert meta-llama/Llama-3.2-1B-Instruct\n```\n\n## CLI\n\nYou can run `uzu` in CLI mode:\n\n```bash\ncargo run --release -p cli\n```\n\nThis launches an interactive app where you can browse, download, and interact with models.\n\nYou can also preselect a model with `--model`, passing its identifier or repository id:\n\n```bash\ncargo run --release -p cli -- --model trymirai/Qwen3.5-4B-M\n```\n\nIf the model is not downloaded yet, the CLI starts downloading it automatically.\n\n## Benchmarks\n\nTo run benchmarks:\n\n```bash\ncargo run --release -p cli -- bench ./workspace/models/0.5.12/{MODEL_NAME} ./workspace/models/0.5.12/{MODEL_NAME}/benchmark_task.json ./workspace/models/0.5.12/{MODEL_NAME}/benchmark_result.json\n```\n\n`benchmark_task.json` is automatically generated after the model is downloaded via `./tools/`.\n\n## Server\n\nYou can also run `uzu` as an OpenAI-compatible HTTP server:\n\n```bash\ncargo run --release -p cli -- server --model trymirai/Qwen3.5-4B-M\n```\n\nThe model is loaded on startup (and downloaded first if needed). By default the server listens on `127.0.0.1:8000`; override the address with `--host` and `--port`:\n\n```bash\ncargo run --release -p cli -- server --model trymirai/Qwen3.5-4B-M --host 0.0.0.0 --port 8080\n```\n\nIt exposes the following endpoints, available both at the root and under `/v1`:\n\n- `POST /v1/chat/completions` — chat completions, with streaming when `\"stream\": true`. Honors `temperature`, `top_p`, `top_k`, and `max_tokens`.\n- `GET /v1/models` — lists the loaded model.\n\n```bash\ncurl http://127.0.0.1:8000/v1/chat/completions \\\n    -H \"Content-Type: application/json\" \\\n    -d '{\n        \"model\": \"trymirai/Qwen3.5-4B-M\",\n        \"messages\": [{\"role\": \"user\", \"content\": \"Hello!\"}],\n        \"stream\": true\n    }'\n```\n\n\n\n## Troubleshooting\n\nIf you experience any problems, please contact us via [Discord](https://discord.com/invite/trymirai) or [email](mailto:contact@getmirai.co).\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftrymirai%2Fuzu","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftrymirai%2Fuzu","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftrymirai%2Fuzu/lists"}