{"id":50483304,"url":"https://github.com/wtdcode/llmy","last_synced_at":"2026-06-01T19:30:55.134Z","repository":{"id":347971274,"uuid":"1195887356","full_name":"wtdcode/llmy","owner":"wtdcode","description":"LLM utilities.","archived":false,"fork":false,"pushed_at":"2026-05-31T18:07:11.000Z","size":1249,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-05-31T18:26:41.985Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wtdcode.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-30T06:52:38.000Z","updated_at":"2026-05-31T18:07:09.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/wtdcode/llmy","commit_stats":null,"previous_names":["wtdcode/llmy"],"tags_count":42,"template":false,"template_full_name":null,"purl":"pkg:github/wtdcode/llmy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wtdcode%2Fllmy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wtdcode%2Fllmy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wtdcode%2Fllmy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wtdcode%2Fllmy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wtdcode","download_url":"https://codeload.github.com/wtdcode/llmy/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wtdcode%2Fllmy/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33790684,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-01T02:00:06.963Z","response_time":115,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-01T19:30:53.640Z","updated_at":"2026-06-01T19:30:55.128Z","avatar_url":"https://github.com/wtdcode.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# LLMY\n\nAll-in-one LLM utilities for Rust — plug OpenAI / Azure settings straight into [clap](https://crates.io/crates/clap), track spend with built-in billing, replay every request when things go wrong, and bridge tools between your agent and the [Model Context Protocol](https://modelcontextprotocol.io/) (MCP).\n\n## Harnessing An Agent\n\nThe harness layer gives you a concrete in-memory `Agent` that can hold conversation state, expose tools to the model, and run a full user turn through any tool-call loop. A minimal coding agent only needs a system prompt, an `LLM`, and a `ToolBox` with the tools you want to expose.\n\nThe example below builds a basic agent that can read files, list directories, and search for files by glob pattern:\n\n```toml\n[dependencies]\nclap = { version = \"4\", features = [\"derive\"] }\nllmy = \"0.13\"\ntokio = { version = \"1\", features = [\"macros\", \"rt-multi-thread\"] }\n```\n\n```rust\nuse std::path::PathBuf;\n\nuse clap::Parser;\nuse llmy::agent::tool::ToolBox;\nuse llmy::agent::tools::files::{FindFileTool, ListDirectoryTool, ReadFileTool};\nuse llmy::clap::OpenAISetup;\nuse llmy::harness::Agent;\n\n#[derive(Parser)]\nstruct Cli {\n    #[command(flatten)]\n    llm: OpenAISetup,\n\n    #[arg(long, default_value = \".\")]\n    root: PathBuf,\n}\n\n#[tokio::main]\nasync fn main() -\u003e Result\u003c(), llmy::LLMYError\u003e {\n    let cli = Cli::parse();\n    let settings = cli.llm.settings();\n    let llm = cli.llm.to_llm().await;\n\n    let mut tools = ToolBox::new();\n    tools.add_tool(ReadFileTool::new(cli.root.clone()));\n    tools.add_tool(ListDirectoryTool::new_root(cli.root.clone()));\n    tools.add_tool(FindFileTool::new(cli.root.clone()));\n\n    let mut agent = Agent::new(\n        \"You are a coding assistant. Use the available file tools whenever you need to inspect the workspace.\".to_string(),\n        tools,\n        \"readme-basic-agent\".to_string(),\n    );\n\n    let result = agent\n        .loop_step_user(\n            \"List the root directory, find Rust files under src, and then read Cargo.toml.\"\n                .to_string(),\n            \u0026llm,\n            Some(\"readme-basic-agent\"),\n            Some(settings),\n        )\n        .await?;\n\n    if let Some(message) = result.assistant_message() {\n        println!(\"{message}\");\n    }\n\n    Ok(())\n}\n```\n\nRun it with your OpenAI settings:\n\n```bash\nOPENAI_API_KEY=sk-... cargo run -- --model gpt-4o --root .\n```\n\n## MCP Support\n\n`llmy` has first-class support for the [Model Context Protocol](https://modelcontextprotocol.io/) (MCP), both as a server and as a client.\n\n### MCP Server — expose a `ToolBox` as an MCP server\n\nAny `ToolBox` can be served as an MCP server over stdio or Streamable HTTP:\n\n```rust\nuse llmy::agent::tool::ToolBox;\nuse llmy::agent::mcp::McpToolBox;\nuse llmy::agent::tools::files::{FindFileTool, ListDirectoryTool, ReadFileTool};\nuse rmcp::model::{Implementation, ServerCapabilities, ServerInfo};\nuse rmcp::transport::StreamableHttpServerConfig;\n\nlet mut tools = ToolBox::new();\ntools.add_tool(ReadFileTool::new(\".\".into()));\ntools.add_tool(ListDirectoryTool::new_root(\".\".into()));\ntools.add_tool(FindFileTool::new(\".\".into()));\n\nlet server_info = ServerInfo::new(\n    ServerCapabilities::builder().enable_tools().build(),\n).with_server_info(Implementation::new(\"my-server\", \"0.1.0\"));\n\nlet server = McpToolBox::new(tools, server_info);\n\n// Serve over stdio (for use with MCP clients like Claude Desktop):\n// server.serve_stdio().await?;\n\n// Or serve over HTTP:\n// server.serve_http(\"127.0.0.1:3000\", StreamableHttpServerConfig::default()).await?;\n```\n\n### MCP Client — connect to remote MCP servers\n\nConnect to any MCP server and import its tools into a `ToolBox`:\n\n```rust\nuse llmy::agent::tools::mcp::McpClient;\n\n// Auto-detects HTTP vs stdio based on the URL scheme:\nlet client = McpClient::connect(\"http://127.0.0.1:3000\").await?;\n// Or for stdio: McpClient::connect(\"npx some-mcp-server\").await?;\n\nlet remote_tools = client.to_toolbox().await?;\n\n// Merge into your agent's toolbox:\ntools.extend(remote_tools);\n```\n\nThe client wraps each remote MCP tool as a `ToolDyn`, so they integrate seamlessly with the agent loop. MCP resources are also exposed as read-only tools.\n\n## CLI\n\nInstall the command-line tool:\n\n```bash\ncargo install llmy-cli\n```\n\n### `llmy chat` — interactive chat\n\n```bash\nOPENAI_API_KEY=sk-... llmy chat --model gpt-4o\n```\n\n```\nYou: Explain async Rust in one sentence.\nAssistant: Async Rust uses futures and an executor to let you write non-blocking,\nconcurrent code with zero-cost abstractions at compile time.\n```\n\nSupports `--system` for a custom system prompt. Reads from stdin when not a TTY.\n\n#### Connecting to MCP servers\n\nUse `--mcp-server` (repeatable) to connect to MCP servers and make their tools available to the agent:\n\n```bash\n# HTTP server\nllmy chat --model gpt-4o --mcp-server http://127.0.0.1:3000\n\n# Stdio server (command is split on whitespace)\nllmy chat --model gpt-4o --mcp-server \"npx some-mcp-server\"\n\n# Multiple servers\nllmy chat --model gpt-4o \\\n    --mcp-server http://localhost:3000 \\\n    --mcp-server \"npx another-server\"\n```\n\n### `llmy mcp-server` — serve file tools over MCP\n\nRun a built-in MCP server that exposes `read_file`, `list_dir`, and `find_file` tools:\n\n```bash\n# Over stdio (for Claude Desktop, etc.)\nllmy mcp-server --root /path/to/project\n\n# Over HTTP\nllmy mcp-server --root . --listen 127.0.0.1:3005\n```\n\n### `llmy tokenizer` — count tokens offline\n\n```bash\necho \"Hello, world!\" | llmy tokenizer --model openai/gpt-4o\n# 4\n\nllmy tokenizer --encoding cl100k_base --input my_prompt.txt --verbose\n# 0  9906   \"Hello\"\n# 1  11    \",\"\n# 2  1917  \" world\"\n# 3  0     \"!\"\n# 4\n```\n\n### `llmy list-req` / `llmy dump-req` — inspect a SQLite `LLM_DEBUG` run\n\nBrowse the requests stored when `LLM_DEBUG` points at a SQLite database (see [Detailed debug logging](#2-detailed-debug-logging-llm_debug)).\n\n```bash\n# All requests of the latest client, with header\nLLM_DEBUG=./run.sqlite3 llmy list-req\n\n# Filter by client id and/or cache key\nllmy list-req --db ./run.sqlite3 --client-id 2 --cache-key chat\n\n# Show one request (human view: metadata + conversation)\nllmy dump-req --db ./run.sqlite3 --req-id 17\n\n# Same row as a single JSON object (LLMDebugRow)\nllmy dump-req --req-id 17 --json\n```\n\n### `llmy models` — list supported models\n\n```\nModel                           Input (per 1M)  Output (per 1M) Max Input  Max Output  Encoding\nanthropic/claude-sonnet-4       $3.00           $15.00          136000     64000       claude\ngoogle/gemini-2.5-flash         $0.30           $2.50           936000     64000       o200k_base\ngoogle/gemini-2.5-pro           $1.25           $10.00          983040     65536       o200k_base\nopenai/gpt-4.1                  $2.00           $8.00           1014808    32768       o200k_base\nopenai/gpt-4o                   $2.50           $10.00          111616     16384       o200k_base\nopenai/gpt-4o-mini              $0.15           $0.60           111616     16384       o200k_base\nopenai/o1                       $15.00          $60.00          100000     100000      o200k_base\nopenai/o3                       $2.00           $8.00           100000     100000      o200k_base\nopenai/o4-mini                  $1.10           $4.40           100000     100000      o200k_base\n…                               (112 models total)\n```\n\n## Library\n\nAdd the dependency (the root crate re-exports everything):\n\n```toml\n[dependencies]\nllmy = \"0.13\"\n```\n\n### 1. Clap integration — up to 3 LLM slots\n\n`llmy-clap` provides three generated arg structs (`OpenAISetup`, `OptOpenAISetup`, `OptOptOpenAISetup`) so you can wire one, two, or three LLMs into any clap-based CLI with zero boilerplate. Each slot is controlled by its own set of env-vars / flags, and can be converted to the core `LLM` client in one call.\n\n```rust\nuse clap::Parser;\nuse llmy::clap::OpenAISetup;      // primary\nuse llmy::clap::OptOpenAISetup;   // optional secondary\n\n#[derive(Parser)]\nstruct Cli {\n    #[command(flatten)]\n    llm: OpenAISetup,\n\n    #[command(flatten)]\n    fallback_llm: OptOpenAISetup,\n}\n\n#[tokio::main]\nasync fn main() {\n    let cli = Cli::parse();\n\n    // One-liner: clap args -\u003e ready-to-use async LLM client\n    let llm = cli.llm.to_llm().await;\n\n    let resp = llm\n        .prompt_once_with_retry(\n            \"You are a helpful assistant.\",\n            \"Explain async Rust in one sentence.\",\n            None,\n            None,\n            None,\n        )\n        .await\n        .unwrap();\n\n    println!(\"{}\", resp.choices[0].message.content.as_deref().unwrap_or(\"\"));\n}\n```\n\nRun it:\n\n```bash\n# OpenAI\nOPENAI_API_KEY=sk-... cargo run -- --model gpt-4o\n\n# Azure\nOPENAI_API_KEY=... cargo run -- \\\n    --azure-openai-endpoint https://my.openai.azure.com \\\n    --azure-deployment gpt-4o \\\n    --model gpt-4o\n```\n\nEvery setting (temperature, timeout, retries, max tokens, reasoning effort, tool choice, ...) is exposed as a flag **and** an env-var:\n\n| Flag | Env var | Default |\n|------|---------|---------|\n| `--model` | `OPENAI_API_MODEL` | `o1` |\n| `--llm-temperature` | `LLM_TEMPERATURE` | — |\n| `--llm-presence-penalty` | `LLM_PRESENCE_PENALTY` | — |\n| `--llm-max-completion-tokens` | `LLM_MAX_COMPLETION_TOKENS` | — |\n| `--top-p` | `LLM_TOP_P` | — |\n| `--llm-retry` | `LLM_RETRY` | `5` |\n| `--llm-prompt-timeout` | `LLM_PROMPT_TIMEOUT` | `1200` (s) |\n| `--llm-stream` | `LLM_STREAM` | `false` |\n| `--reasoning-effort` | `LLM_REASONING_EFFORT` | — |\n\nThe second and third slots use the prefixes `OPT_` and `OPT_OPT_` for their env-vars (e.g. `OPT_OPENAI_API_KEY`, `OPT_OPT_OPENAI_API_MODEL`).\n\n---\n\n### 2. Detailed debug logging (`LLM_DEBUG`)\n\n`LLM_DEBUG` accepts two backends, picked by the value's shape:\n\n- **Folder** (any other path): one `.xml` + `.json` pair per round-trip, written under a per-process subfolder.\n- **SQLite** (value starting with `sqlite3://` or ending in `sqlite3`): every request is a row in a long-lived database, queryable via the bundled CLI (`llmy list-req` / `llmy dump-req`) or any sqlite client.\n\n#### Folder backend\n\n```bash\nLLM_DEBUG=./debug_logs OPENAI_API_KEY=sk-... cargo run\n```\n\nThis creates a per-process subfolder with numbered files:\n\n```\ndebug_logs/\n└── 48291-0-main/\n    ├── llm-000000000001.xml\n    ├── llm-000000000001.json\n    ├── llm-000000000002.xml\n    └── llm-000000000002.json\n```\n\nThe `.xml` file looks like:\n\n```xml\n=====================\n\u003cRequest\u003e\n\u003cSYSTEM\u003e\nYou are a helpful assistant.\n\u003c/SYSTEM\u003e\n\u003cUSER\u003e\nExplain async Rust in one sentence.\n\u003c/USER\u003e\n\u003ctool name=\"search\", description=\"Search the web\", strict=false\u003e\n{\n  \"type\": \"object\",\n  \"properties\": { \"query\": { \"type\": \"string\" } }\n}\n\u003c/tool\u003e\n\u003c/Request\u003e\n=====================\n=====================\n\u003cResponse\u003e\n\u003cASSISTANT\u003e\nAsync Rust lets you write concurrent code ...\n\u003c/ASSISTANT\u003e\n\u003c/Response\u003e\n=====================\n```\n\nThe `.json` companion contains the full serialised `CreateChatCompletionRequest` / `CreateChatCompletionResponse` objects for programmatic analysis.\n\n#### SQLite backend\n\n```bash\nLLM_DEBUG=./run.sqlite3 OPENAI_API_KEY=sk-... cargo run\n# or\nLLM_DEBUG=sqlite3:///abs/path/to/run.sqlite3 ...\n```\n\nOn startup a row is inserted into a `client` table and every request lands in `llm_debug` (model, endpoint URL, azure deployment, cache key, raw request/response JSON, per-request token counts, running USD spend, full conversation, timestamp). Inspect a run with the CLI:\n\n```bash\n# Most recent client only, optionally filtered by cache key\nllmy list-req --cache-key chat\n# id  client  ts                    model    endpoint                    deployment  cache_key  input  cached  output  reasoning  usage_usd  resp\n# 1   3       2026-05-08 19:39:23   gpt-4o   https://api.openai.com/v1/  -           chat       142    0       38      0          0.001020   ok\n\n# Drill into a single row (--json dumps the full LLMDebugRow struct)\nllmy dump-req --req-id 1\nllmy dump-req --req-id 1 --json\n```\n\nYou can also embed the read API directly: `Sqlite3DebugDB::open_existing(url)` + `list_filtered` / `get_row` return strongly-typed `LLMDebugRow` values.\n\n---\n\n### 3. Built-in billing with automatic budget enforcement\n\n`llmy` ships with up-to-date per-token pricing for 110+ models (GPT-4o, o1, o3, GPT-5 family, Claude, Gemini, ...). Token usage is tracked in real-time including **cached-input** and **reasoning** token discounts. When spend exceeds the budget cap the client returns `LLMYError::Billing` immediately — no more surprise bills.\n\n```rust\nuse llmy::client::{LLM, SupportedConfig};\nuse llmy::client::settings::LLMSettings;\n\nlet settings = LLMSettings::default();\nlet model = \"gpt-4o\".parse().unwrap();\n\nlet llm = LLM::new(\n    SupportedConfig::new(\"https://api.openai.com/v1\", \"sk-...\"),\n    model,\n    5.0, // budget cap in USD\n    settings,\n    None, // Option\u003cDebugBackend\u003e; see LLM::new_async for LLM_DEBUG-style strings\n);\n\nmatch llm.prompt_once(\"system\", \"user\", None, None, None).await {\n    Ok(resp) =\u003e { /* ... */ }\n    Err(llmy::LLMYError::Billing(cap, current)) =\u003e {\n        eprintln!(\"Budget exceeded: ${:.4} / ${:.2}\", current, cap);\n    }\n    Err(e) =\u003e { eprintln!(\"Error: {e}\"); }\n}\n```\n\nVia clap the cap defaults to **$10** and can be overridden:\n\n```bash\ncargo run -- --billing-cap 2.5 --model gpt-4o-mini\n```\n\nFor models not in the built-in list, pass pricing inline:\n\n```bash\ncargo run -- --model \"my-custom-model,1.0,4.0,0.5\"\n#                      name,         in, out, cached\n```\n\n---\n\n### 4. Offline token estimation\n\n`llmy` includes a built-in tokenizer with fast, offline BPE token estimation for 110+ models across OpenAI, Anthropic, Google, and more. Encodings and model metadata are baked into the binary at compile time — no network calls, no data files to ship.\n\nFour encodings are supported: **cl100k_base**, **o200k_base**, **p50k_base** (OpenAI / tiktoken) and **claude** (Anthropic).\n\n```rust\nuse llmy::tokenizer::{encode, count_tokens, count_tokens_for_model, Encoding};\n\n// Encode text into token IDs\nlet tokens: Vec\u003cu32\u003e = encode(\"Hello, world!\", Encoding::O200kBase);\n\n// Count tokens directly\nlet n = count_tokens(\"Hello, world!\", Encoding::Cl100kBase);\n\n// Or let the library resolve the encoding from a model ID\nlet n = count_tokens_for_model(\"Hello, world!\", \"openai/gpt-4o\"); // Some(4)\nlet n = count_tokens_for_model(\"Hello, world!\", \"anthropic/claude-sonnet-4\"); // Some(4)\n```\n\nThe model registry is generated from the same source-of-truth JSON used by the billing system, so model look-ups, pricing, and token counts always stay in sync.\n\n---\n\n### 5. Defining tools with `Tool` and `#[tool(...)]`\n\n`llmy-agent` models callable tools as a Rust trait. A tool has a strongly typed argument struct, a stable tool name, an optional description, and an async `invoke` method that returns `Result\u003cString, LLMYError\u003e`.\n\nYou can depend on either the focused crate pair:\n\n```toml\n[dependencies]\nllmy-agent = \"0.13\"\nllmy-agent-derive = \"0.13\"\n```\n\nor the root crate plus the derive crate:\n\n```toml\n[dependencies]\nllmy = \"0.13\"\nllmy-agent-derive = \"0.13\"\n```\n\nThe `Tool` trait defines the typed interface that users implement:\n\n```rust\npub trait Tool: Send + Sync + std::fmt::Debug {\n    type ARGUMENTS: DeserializeOwned + JsonSchema + Send;\n    const NAME: \u0026str;\n    const DESCRIPTION: Option\u003c\u0026str\u003e;\n\n    fn invoke(\n        \u0026self,\n        arguments: Self::ARGUMENTS,\n    ) -\u003e impl Future\u003cOutput = Result\u003cString, LLMYError\u003e\u003e + Send;\n}\n```\n\nThe companion `ToolDyn` trait provides the object-safe, type-erased interface used by `ToolBox` and the agent loop. A blanket `impl ToolDyn for T: Tool` bridges the two automatically — users only implement `Tool`.\n\nIn practice you usually write the typed arguments and the async method, then let `llmy-agent-derive` generate the `impl Tool` for you:\n\n```rust\nuse std::path::PathBuf;\n\nuse llmy::agent::LLMYError;\nuse llmy_agent_derive::tool;\nuse schemars::JsonSchema;\nuse serde::Deserialize;\n\n#[derive(Deserialize, JsonSchema, Default)]\npub struct ReadFileArgs {\n    /// The path of the file to read\n    pub file_path: PathBuf,\n}\n\n#[derive(Debug, Clone)]\n#[tool(\n    arguments = ReadFileArgs,\n    invoke = read_file,\n    description = \"Read file contents from `file_path`.\",\n    name = \"read_file\",\n)]\npub struct ReadFileTool {\n    pub cwd: PathBuf,\n}\n\nimpl ReadFileTool {\n    pub async fn read_file(\u0026self, args: ReadFileArgs) -\u003e Result\u003cString, LLMYError\u003e {\n        let path = self.cwd.join(args.file_path);\n        Ok(tokio::fs::read_to_string(path).await?)\n    }\n}\n```\n\nNotes:\n\n- `arguments` and `invoke` are required in `#[tool(...)]`.\n- `description` is optional.\n- `name` is optional; if omitted, the struct name is converted to `snake_case`, for example `ReadFileTool -\u003e read_file_tool`.\n- The generated impl works with either `llmy_agent::Tool` or `llmy::agent::Tool`.\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwtdcode%2Fllmy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwtdcode%2Fllmy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwtdcode%2Fllmy/lists"}