{"id":27811519,"url":"https://github.com/tattn/localllmclient","last_synced_at":"2025-10-26T07:11:28.894Z","repository":{"id":290581153,"uuid":"974827014","full_name":"tattn/LocalLLMClient","owner":"tattn","description":"Swift package to run local LLMs on iOS, macOS, Linux","archived":false,"fork":false,"pushed_at":"2025-09-29T16:18:26.000Z","size":602,"stargazers_count":102,"open_issues_count":5,"forks_count":22,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-09-29T18:28:30.129Z","etag":null,"topics":["foundation-models","gemma","gguf","ios","linux","llama","llm","macos","mlx","qwen","swift"],"latest_commit_sha":null,"homepage":"","language":"Swift","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tattn.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":"tattn","patreon":"tattn","open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"lfx_crowdfunding":null,"polar":null,"buy_me_a_coffee":null,"thanks_dev":null,"custom":null}},"created_at":"2025-04-29T11:16:11.000Z","updated_at":"2025-09-29T16:18:29.000Z","dependencies_parsed_at":"2025-06-02T16:53:38.087Z","dependency_job_id":"ab563e40-a1a2-4103-bf46-7f8fbaa7d430","html_url":"https://github.com/tattn/LocalLLMClient","commit_stats":null,"previous_names":["tattn/localllmclient"],"tags_count":11,"template":false,"template_full_name":null,"purl":"pkg:github/tattn/LocalLLMClient","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tattn%2FLocalLLMClient","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tattn%2FLocalLLMClient/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tattn%2FLocalLLMClient/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tattn%2FLocalLLMClient/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tattn","download_url":"https://codeload.github.com/tattn/LocalLLMClient/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tattn%2FLocalLLMClient/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279006867,"owners_count":26084208,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-11T02:00:06.511Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["foundation-models","gemma","gguf","ios","linux","llama","llm","macos","mlx","qwen","swift"],"created_at":"2025-05-01T12:00:46.066Z","updated_at":"2025-10-11T10:47:55.117Z","avatar_url":"https://github.com/tattn.png","language":"Swift","funding_links":["https://github.com/sponsors/tattn","https://patreon.com/tattn"],"categories":[],"sub_categories":[],"readme":"# LocalLLMClient\n\n[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n[![CI](https://github.com/tattn/LocalLLMClient/actions/workflows/test.yml/badge.svg)](https://github.com/tattn/LocalLLMClient/actions/workflows/test.yml)\n[![](https://img.shields.io/endpoint?url=https%3A%2F%2Fswiftpackageindex.com%2Fapi%2Fpackages%2Ftattn%2FLocalLLMClient%2Fbadge%3Ftype%3Dswift-versions)](https://swiftpackageindex.com/tattn/LocalLLMClient)\n[![](https://img.shields.io/endpoint?url=https%3A%2F%2Fswiftpackageindex.com%2Fapi%2Fpackages%2Ftattn%2FLocalLLMClient%2Fbadge%3Ftype%3Dplatforms)](https://swiftpackageindex.com/tattn/LocalLLMClient)\n\n\nA Swift package to interact with local Large Language Models (LLMs) on Apple platforms.\n\n\u003ctable\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e\u003cimg src=\"https://github.com/user-attachments/assets/f949ba1d-f063-463c-a6fa-dcdf14c01e8b\" width=\"100%\" alt=\"example on iOS\" /\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003cimg src=\"https://github.com/user-attachments/assets/3ac6aef5-df1a-45e9-8989-e4dbce223ceb\" width=\"100%\" alt=\"example on macOS\" /\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eDemo / Multimodal\u003c/summary\u003e\n\n| MobileVLM-3B (llama.cpp) | Qwen2.5 VL 3B (MLX) |\n|:-:|:-:|\n|\u003cvideo src=\"https://github.com/user-attachments/assets/7704b05c-2a8c-40ef-838c-f9485ad0cfe0\"\u003e|\u003cvideo src=\"https://github.com/user-attachments/assets/475609a4-aaef-4043-aadc-db44c28296ee\"\u003e|\n\n*iPhone 16 Pro*\n\n\u003c/details\u003e\n\n[Example app](https://github.com/tattn/LocalLLMClient/tree/main/Example)\n\n\u003e [!IMPORTANT]\n\u003e This project is still experimental. The API is subject to change.\n\n\u003e [!TIP]\n\u003e To run larger models more reliably, consider adding `com.apple.developer.kernel.increased-memory-limit` entitlement to your app.\n\n## Features\n\n- Support for [GGUF](https://github.com/ggml-org/ggml/blob/master/docs/gguf.md) / [MLX models](https://opensource.apple.com/projects/mlx/) / [FoundationModels framework](https://developer.apple.com/documentation/foundationmodels)\n- Support for iOS, macOS and Linux\n- Streaming API\n- Multimodal (experimental)\n- Tool calling (experimental)\n\n## Installation\n\nAdd the following dependency to your `Package.swift` file:\n\n```swift\ndependencies: [\n    .package(url: \"https://github.com/tattn/LocalLLMClient.git\", branch: \"main\")\n]\n```\n\n## Usage\n\nThe API documentation is available [here](https://tattn.github.io/LocalLLMClient/documentation/).\n\n### Quick Start\n\n```swift\nimport LocalLLMClient\nimport LocalLLMClientLlama\n\nlet session = LLMSession(model: .llama(\n    id: \"lmstudio-community/gemma-3-4B-it-qat-GGUF\",\n    model: \"gemma-3-4B-it-QAT-Q4_0.gguf\"\n))\n\nprint(try await session.respond(to: \"Tell me a joke.\"))\n\nfor try await text in session.streamResponse(to: \"Write a story about cats.\") {\n    print(text, terminator: \"\")\n}\n```\n\n### Using with Each Backend\n\n\u003cdetails open\u003e\n\u003csummary\u003eUsing llama.cpp\u003c/summary\u003e\n\n```swift\nimport LocalLLMClient\nimport LocalLLMClientLlama\n\n// Create a model\nlet model = LLMSession.DownloadModel.llama(\n    id: \"lmstudio-community/gemma-3-4B-it-qat-GGUF\",\n    model: \"gemma-3-4B-it-QAT-Q4_0.gguf\",\n    parameter: .init(\n        temperature: 0.7,   // Randomness (0.0〜1.0)\n        topK: 40,           // Top-K sampling\n        topP: 0.9,          // Top-P (nucleus) sampling\n        options: .init(responseFormat: .json) // Response format\n    )\n)\n\n// You can track download progress\ntry await model.downloadModel { progress in \n    print(\"Download progress: \\(progress)\")\n}\n\n// Create a session with the downloaded model\nlet session = LLMSession(model: model)\n\n// Generate a response with a specific prompt\nlet response = try await session.respond(to: \"\"\"\nCreate the beginning of a synopsis for an epic story with a cat as the main character.\nFormat it in JSON, as shown below.\n{\n    \"title\": \"\u003ctitle\u003e\",\n    \"content\": \"\u003ccontent\u003e\",\n}\n\"\"\")\nprint(response)\n\n// You can also add system messages before asking questions\nsession.messages = [.system(\"You are a helpful assistant.\")]\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eUsing Apple MLX\u003c/summary\u003e\n\n```swift\nimport LocalLLMClient\nimport LocalLLMClientMLX\n\n// Create a model\nlet model = LLMSession.DownloadModel.mlx(\n    id: \"mlx-community/Qwen3-1.7B-4bit\",\n    parameter: .init(\n        temperature: 0.7,    // Randomness (0.0 to 1.0)\n        topP: 0.9            // Top-P (nucleus) sampling\n    )\n)\n\n// You can track download progress\ntry await model.downloadModel { progress in \n    print(\"Download progress: \\(progress)\")\n}\n\n// Create a session with the downloaded model\nlet session = LLMSession(model: model)\n\n// Generate text with system and user messages\nsession.messages = [.system(\"You are a helpful assistant.\")]\nlet response = try await session.respond(to: \"Tell me a story about a cat.\")\nprint(response)\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eUsing Apple FoundationModels\u003c/summary\u003e\n\n```swift\nimport LocalLLMClient\nimport LocalLLMClientFoundationModels\n\n// Available on iOS 26.0+ / macOS 26.0+ and requires Apple Intelligence \nlet session = LLMSession(model: .foundationModels(\n    // Use system's default model\n    model: .default,\n    // Configure generation options\n    parameter: .init(\n        temperature: 0.7,\n    )\n))\n\n// Generate a response with a specific prompt\nlet response = try await session.respond(to: \"Tell me a short story about a clever fox.\")\nprint(response)\n```\n\u003c/details\u003e\n\n### Tool Calling\n\nLocalLLMClient supports tool calling for integrations with external systems.\n\n\u003e [!IMPORTANT]\n\u003e Tool calling is only available with models that support this feature. Each backend has different model compatibility.\n\u003e \n\u003e Make sure your chosen model explicitly supports tool calling before using this feature.\n\n\u003cdetails open\u003e\n\u003csummary\u003eUsing tool calling\u003c/summary\u003e\n\n```swift\nimport LocalLLMClient\nimport LocalLLMClientLlama\n\n@Tool(\"get_weather\")\nstruct GetWeatherTool {\n    let description = \"Get the current weather in a given location\"\n    \n    @ToolArguments\n    struct Arguments {\n        @ToolArgument(\"The city and state, e.g. San Francisco, CA\")\n        var location: String\n        \n        @ToolArgument(\"Temperature unit\")\n        var unit: Unit?\n        \n        @ToolArgumentEnum\n        enum Unit: String {\n            case celsius\n            case fahrenheit\n        }\n    }\n    \n    func call(arguments: Arguments) async throws -\u003e ToolOutput {\n        // In a real implementation, this would call a weather API\n        let temp = arguments.unit == .celsius ? \"22°C\" : \"72°F\"\n        return ToolOutput([\n            \"location\": arguments.location,\n            \"temperature\": temp,\n            \"condition\": \"sunny\"\n        ])\n    }\n}\n\n// Create the tool\nlet weatherTool = GetWeatherTool()\n\n// Create a session with a model that supports tool calling and register tools\nlet session = LLMSession(\n    model: .llama(\n        id: \"Qwen/Qwen2.5-1.5B-Instruct-GGUF\",\n        model: \"qwen2.5-1.5b-instruct-q4_k_m.gguf\"\n    ),\n    tools: [weatherTool]\n)\n\n// Ask a question that requires tool use\nlet response = try await session.respond(to: \"What's the weather like in Tokyo?\")\nprint(response)\n\n// The model will automatically call the weather tool and include the result in its response\n```\n\u003c/details\u003e\n\n### Multimodal for Image Processing\n\nLocalLLMClient also supports multimodal models for processing images.\n\n\u003cdetails open\u003e\n\u003csummary\u003eUsing with llama.cpp\u003c/summary\u003e\n\n```swift\nimport LocalLLMClient\nimport LocalLLMClientLlama\n\n// Create a session with a multimodal model\nlet session = LLMSession(model: .llama(\n    id: \"ggml-org/gemma-3-4b-it-GGUF\",\n    model: \"gemma-3-4b-it-Q8_0.gguf\",\n    mmproj: \"mmproj-model-f16.gguf\"\n))\n\n// Ask a question about an image\nlet response = try await session.respond(\n    to: \"What's in this image?\", \n    attachments: [.image(.init(resource: .yourImage))]\n)\nprint(response)\n\n// You can also stream the response\nfor try await text in session.streamResponse(\n    to: \"Describe this image in detail\", \n    attachments: [.image(.init(resource: .yourImage))]\n) {\n    print(text, terminator: \"\")\n}\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eUsing with Apple MLX\u003c/summary\u003e\n\n```swift\nimport LocalLLMClient\nimport LocalLLMClientMLX\n\n// Create a session with a multimodal model\nlet session = LLMSession(model: .mlx(\n    id: \"mlx-community/Qwen2.5-VL-3B-Instruct-abliterated-4bit\"\n))\n\n// Ask a question about an image\nlet response = try await session.respond(\n    to: \"What's in this image?\", \n    attachments: [.image(.init(resource: .yourImage))]\n)\nprint(response)\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ch3\u003eAdvanced Usage: Low Level API\u003c/h3\u003e\u003c/summary\u003e\n\nFor more advanced control over model loading and inference, you can use the `LocalLLMClient` APIs directly.\n\n\u003cdetails\u003e\n\u003csummary\u003eUsing with llama.cpp\u003c/summary\u003e\n\n```swift\nimport LocalLLMClient\nimport LocalLLMClientLlama\nimport LocalLLMClientUtility\n\n// Download model from Hugging Face (Gemma 3)\nlet ggufName = \"gemma-3-4B-it-QAT-Q4_0.gguf\"\nlet downloader = FileDownloader(source: .huggingFace(\n    id: \"lmstudio-community/gemma-3-4B-it-qat-GGUF\",\n    globs: [ggufName]\n))\n\ntry await downloader.download { print(\"Progress: \\($0)\") }\n\n// Initialize a client with the downloaded model\nlet modelURL = downloader.destination.appending(component: ggufName)\nlet client = try await LocalLLMClient.llama(url: modelURL, parameter: .init(\n    context: 4096,      // Context size\n    temperature: 0.7,   // Randomness (0.0〜1.0)\n    topK: 40,           // Top-K sampling\n    topP: 0.9,          // Top-P (nucleus) sampling\n    options: .init(responseFormat: .json) // Response format\n))\n\nlet prompt = \"\"\"\nCreate the beginning of a synopsis for an epic story with a cat as the main character.\nFormat it in JSON, as shown below.\n{\n    \"title\": \"\u003ctitle\u003e\",\n    \"content\": \"\u003ccontent\u003e\",\n}\n\"\"\"\n\n// Generate text\nlet input = LLMInput.chat([\n    .system(\"You are a helpful assistant.\"),\n    .user(prompt)\n])\n\nfor try await text in try await client.textStream(from: input) {\n    print(text, terminator: \"\")\n}\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eUsing with Apple MLX\u003c/summary\u003e\n\n```swift\nimport LocalLLMClient\nimport LocalLLMClientMLX\nimport LocalLLMClientUtility\n\n// Download model from Hugging Face\nlet downloader = FileDownloader(\n    source: .huggingFace(id: \"mlx-community/Qwen3-1.7B-4bit\", globs: .mlx)\n)\ntry await downloader.download { print(\"Progress: \\($0)\") }\n\n// Initialize a client with the downloaded model\nlet client = try await LocalLLMClient.mlx(url: downloader.destination, parameter: .init(\n    temperature: 0.7,    // Randomness (0.0 to 1.0)\n    topP: 0.9            // Top-P (nucleus) sampling\n))\n\n// Generate text\nlet input = LLMInput.chat([\n    .system(\"You are a helpful assistant.\"),\n    .user(\"Tell me a story about a cat.\")\n])\n\nfor try await text in try await client.textStream(from: input) {\n    print(text, terminator: \"\")\n}\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eUsing with Apple FoundationModels\u003c/summary\u003e\n\n```swift\nimport LocalLLMClient\nimport LocalLLMClientFoundationModels\n\n// Available on iOS 26.0+ / macOS 26.0+ and requires Apple Intelligence \nlet client = try await LocalLLMClient.foundationModels(\n    // Use system's default model\n    model: .default,\n    // Configure generation options\n    parameter: .init(\n        temperature: 0.7,\n    )\n)\n\n// Generate text\nlet input = LLMInput.chat([\n    .system(\"You are a helpful assistant.\"),\n    .user(\"Tell me a short story about a clever fox.\")\n])\n\nfor try await text in try await client.textStream(from: input) {\n    print(text, terminator: \"\")\n}\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eAdvanced Multimodal with llama.cpp\u003c/summary\u003e\n\n```swift\nimport LocalLLMClient\nimport LocalLLMClientLlama\nimport LocalLLMClientUtility\n\n// Download model from Hugging Face (Gemma 3)\nlet model = \"gemma-3-4b-it-Q8_0.gguf\"\nlet mmproj = \"mmproj-model-f16.gguf\"\n\nlet downloader = FileDownloader(\n    source: .huggingFace(id: \"ggml-org/gemma-3-4b-it-GGUF\", globs: [model, mmproj]),\n)\ntry await downloader.download { print(\"Download: \\($0)\") }\n\n// Initialize a client with the downloaded model\nlet client = try await LocalLLMClient.llama(\n    url: downloader.destination.appending(component: model),\n    mmprojURL: downloader.destination.appending(component: mmproj)\n)\n\nlet input = LLMInput.chat([\n    .user(\"What's in this image?\", attachments: [.image(.init(resource: .yourImage))]),\n])\n\n// Generate text without streaming\nprint(try await client.generateText(from: input))\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eAdvanced Multimodal with Apple MLX\u003c/summary\u003e\n\n```swift\nimport LocalLLMClient\nimport LocalLLMClientMLX\nimport LocalLLMClientUtility\n\n// Download model from Hugging Face (Qwen2.5 VL)\nlet downloader = FileDownloader(source: .huggingFace(\n    id: \"mlx-community/Qwen2.5-VL-3B-Instruct-abliterated-4bit\",\n    globs: .mlx\n))\ntry await downloader.download { print(\"Progress: \\($0)\") }\n\nlet client = try await LocalLLMClient.mlx(url: downloader.destination)\n\nlet input = LLMInput.chat([\n    .user(\"What's in this image?\", attachments: [.image(.init(resource: .yourImage))]),\n])\n\n// Generate text without streaming\nprint(try await client.generateText(from: input))\n```\n\u003c/details\u003e\n\u003c/details\u003e\n\n### CLI Tool\n\nYou can use LocalLLMClient directly from the terminal using the command line tool:\n\n```bash\n# Run using llama.cpp\nswift run LocalLLMCLI --model /path/to/your/model.gguf \"Your prompt here\"\n\n# Run using MLX\n./scripts/run_mlx.sh --model https://huggingface.co/mlx-community/Qwen3-1.7B-4bit \"Your prompt here\"\n```\n\n## Tested Models\n\n- LLaMA 3\n- Gemma 3 / 2\n- Qwen 3 / 2\n- Phi 4\n\n\n\u003e [Models compatible with llama.cpp backend](https://github.com/ggml-org/llama.cpp?tab=readme-ov-file#text-only)  \n\u003e [Models compatible with MLX backend](https://github.com/ml-explore/mlx-swift-examples/blob/main/Libraries/MLXLLM/Documentation.docc/Documentation.md)  \n\n*If you have a model that works, please open an issue or PR to add it to the list.*\n\n## Requirements\n\n- iOS 16.0+ / macOS 14.0+\n- Xcode 16.0+\n\n## Acknowledgements\n\nThis package uses [llama.cpp](https://github.com/ggml-org/llama.cpp), [Apple's MLX](https://opensource.apple.com/projects/mlx/) and [Foundation Models framework](https://developer.apple.com/documentation/foundationmodels) for model inference.\n\n---\n\n[Support this project :heart:](https://github.com/sponsors/tattn)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftattn%2Flocalllmclient","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftattn%2Flocalllmclient","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftattn%2Flocalllmclient/lists"}