https://github.com/tattn/localllmclient

Swift package to run local LLMs on iOS, macOS, Linux
https://github.com/tattn/localllmclient
foundation-models gemma gguf ios linux llama llm macos mlx qwen swift
Last synced: 7 months ago
JSON representation
Swift package to run local LLMs on iOS, macOS, Linux
Host: GitHub
URL: https://github.com/tattn/localllmclient
Owner: tattn
License: mit
Created: 2025-04-29T11:16:11.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-09-29T16:18:26.000Z (8 months ago)
Last Synced: 2025-09-29T18:28:30.129Z (8 months ago)
Topics: foundation-models, gemma, gguf, ios, linux, llama, llm, macos, mlx, qwen, swift
Language: Swift
Homepage:
Size: 588 KB
Stars: 102
Watchers: 5
Forks: 22
Open Issues: 5
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project

README

          # LocalLLMClient

[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT)

[![CI](https://github.com/tattn/LocalLLMClient/actions/workflows/test.yml/badge.svg)](https://github.com/tattn/LocalLLMClient/actions/workflows/test.yml)

[![](https://img.shields.io/endpoint?url=https%3A%2F%2Fswiftpackageindex.com%2Fapi%2Fpackages%2Ftattn%2FLocalLLMClient%2Fbadge%3Ftype%3Dswift-versions)](https://swiftpackageindex.com/tattn/LocalLLMClient)

[![](https://img.shields.io/endpoint?url=https%3A%2F%2Fswiftpackageindex.com%2Fapi%2Fpackages%2Ftattn%2FLocalLLMClient%2Fbadge%3Ftype%3Dplatforms)](https://swiftpackageindex.com/tattn/LocalLLMClient)

A Swift package to interact with local Large Language Models (LLMs) on Apple platforms.

  

    

    

  

Demo / Multimodal

| MobileVLM-3B (llama.cpp) | Qwen2.5 VL 3B (MLX) |

|:-:|:-:|

|||

*iPhone 16 Pro*

[Example app](https://github.com/tattn/LocalLLMClient/tree/main/Example)

> [!IMPORTANT]

> This project is still experimental. The API is subject to change.

> [!TIP]

> To run larger models more reliably, consider adding `com.apple.developer.kernel.increased-memory-limit` entitlement to your app.

## Features

- Support for [GGUF](https://github.com/ggml-org/ggml/blob/master/docs/gguf.md) / [MLX models](https://opensource.apple.com/projects/mlx/) / [FoundationModels framework](https://developer.apple.com/documentation/foundationmodels)

- Support for iOS, macOS and Linux

- Streaming API

- Multimodal (experimental)

- Tool calling (experimental)

## Installation

Add the following dependency to your `Package.swift` file:

```swift

dependencies: [

    .package(url: "https://github.com/tattn/LocalLLMClient.git", branch: "main")

]

```

## Usage

The API documentation is available [here](https://tattn.github.io/LocalLLMClient/documentation/).

### Quick Start

```swift

import LocalLLMClient

import LocalLLMClientLlama

let session = LLMSession(model: .llama(

    id: "lmstudio-community/gemma-3-4B-it-qat-GGUF",

    model: "gemma-3-4B-it-QAT-Q4_0.gguf"

))

print(try await session.respond(to: "Tell me a joke."))

for try await text in session.streamResponse(to: "Write a story about cats.") {

    print(text, terminator: "")

}

```

### Using with Each Backend

Using llama.cpp

```swift

import LocalLLMClient

import LocalLLMClientLlama

// Create a model

let model = LLMSession.DownloadModel.llama(

    id: "lmstudio-community/gemma-3-4B-it-qat-GGUF",

    model: "gemma-3-4B-it-QAT-Q4_0.gguf",

    parameter: .init(

        temperature: 0.7,   // Randomness (0.0〜1.0)

        topK: 40,           // Top-K sampling

        topP: 0.9,          // Top-P (nucleus) sampling

        options: .init(responseFormat: .json) // Response format

    )

)

// You can track download progress

try await model.downloadModel { progress in 

    print("Download progress: \(progress)")

}

// Create a session with the downloaded model

let session = LLMSession(model: model)

// Generate a response with a specific prompt

let response = try await session.respond(to: """

Create the beginning of a synopsis for an epic story with a cat as the main character.

Format it in JSON, as shown below.

{

    "title": "",

    "content": "",

}

""")

print(response)

// You can also add system messages before asking questions

session.messages = [.system("You are a helpful assistant.")]

```

Using Apple MLX

```swift

import LocalLLMClient

import LocalLLMClientMLX

// Create a model

let model = LLMSession.DownloadModel.mlx(

    id: "mlx-community/Qwen3-1.7B-4bit",

    parameter: .init(

        temperature: 0.7,    // Randomness (0.0 to 1.0)

        topP: 0.9            // Top-P (nucleus) sampling

    )

)

// You can track download progress

try await model.downloadModel { progress in 

    print("Download progress: \(progress)")

}

// Create a session with the downloaded model

let session = LLMSession(model: model)

// Generate text with system and user messages

session.messages = [.system("You are a helpful assistant.")]

let response = try await session.respond(to: "Tell me a story about a cat.")

print(response)

```

Using Apple FoundationModels

```swift

import LocalLLMClient

import LocalLLMClientFoundationModels

// Available on iOS 26.0+ / macOS 26.0+ and requires Apple Intelligence 

let session = LLMSession(model: .foundationModels(

    // Use system's default model

    model: .default,

    // Configure generation options

    parameter: .init(

        temperature: 0.7,

    )

))

// Generate a response with a specific prompt

let response = try await session.respond(to: "Tell me a short story about a clever fox.")

print(response)

```

### Tool Calling

LocalLLMClient supports tool calling for integrations with external systems.

> [!IMPORTANT]

> Tool calling is only available with models that support this feature. Each backend has different model compatibility.

> 

> Make sure your chosen model explicitly supports tool calling before using this feature.

Using tool calling

```swift

import LocalLLMClient

import LocalLLMClientLlama

@Tool("get_weather")

struct GetWeatherTool {

    let description = "Get the current weather in a given location"

    

    @ToolArguments

    struct Arguments {

        @ToolArgument("The city and state, e.g. San Francisco, CA")

        var location: String

        

        @ToolArgument("Temperature unit")

        var unit: Unit?

        

        @ToolArgumentEnum

        enum Unit: String {

            case celsius

            case fahrenheit

        }

    }

    

    func call(arguments: Arguments) async throws -> ToolOutput {

        // In a real implementation, this would call a weather API

        let temp = arguments.unit == .celsius ? "22°C" : "72°F"

        return ToolOutput([

            "location": arguments.location,

            "temperature": temp,

            "condition": "sunny"

        ])

    }

}

// Create the tool

let weatherTool = GetWeatherTool()

// Create a session with a model that supports tool calling and register tools

let session = LLMSession(

    model: .llama(

        id: "Qwen/Qwen2.5-1.5B-Instruct-GGUF",

        model: "qwen2.5-1.5b-instruct-q4_k_m.gguf"

    ),

    tools: [weatherTool]

)

// Ask a question that requires tool use

let response = try await session.respond(to: "What's the weather like in Tokyo?")

print(response)

// The model will automatically call the weather tool and include the result in its response

```

### Multimodal for Image Processing

LocalLLMClient also supports multimodal models for processing images.

Using with llama.cpp

```swift

import LocalLLMClient

import LocalLLMClientLlama

// Create a session with a multimodal model

let session = LLMSession(model: .llama(

    id: "ggml-org/gemma-3-4b-it-GGUF",

    model: "gemma-3-4b-it-Q8_0.gguf",

    mmproj: "mmproj-model-f16.gguf"

))

// Ask a question about an image

let response = try await session.respond(

    to: "What's in this image?", 

    attachments: [.image(.init(resource: .yourImage))]

)

print(response)

// You can also stream the response

for try await text in session.streamResponse(

    to: "Describe this image in detail", 

    attachments: [.image(.init(resource: .yourImage))]

) {

    print(text, terminator: "")

}

```

Using with Apple MLX

```swift

import LocalLLMClient

import LocalLLMClientMLX

// Create a session with a multimodal model

let session = LLMSession(model: .mlx(

    id: "mlx-community/Qwen2.5-VL-3B-Instruct-abliterated-4bit"

))

// Ask a question about an image

let response = try await session.respond(

    to: "What's in this image?", 

    attachments: [.image(.init(resource: .yourImage))]

)

print(response)

```

Advanced Usage: Low Level API


For more advanced control over model loading and inference, you can use the `LocalLLMClient` APIs directly.

Using with llama.cpp

```swift

import LocalLLMClient

import LocalLLMClientLlama

import LocalLLMClientUtility

// Download model from Hugging Face (Gemma 3)

let ggufName = "gemma-3-4B-it-QAT-Q4_0.gguf"

let downloader = FileDownloader(source: .huggingFace(

    id: "lmstudio-community/gemma-3-4B-it-qat-GGUF",

    globs: [ggufName]

))

try await downloader.download { print("Progress: \($0)") }

// Initialize a client with the downloaded model

let modelURL = downloader.destination.appending(component: ggufName)

let client = try await LocalLLMClient.llama(url: modelURL, parameter: .init(

    context: 4096,      // Context size

    temperature: 0.7,   // Randomness (0.0〜1.0)

    topK: 40,           // Top-K sampling

    topP: 0.9,          // Top-P (nucleus) sampling

    options: .init(responseFormat: .json) // Response format

))

let prompt = """

Create the beginning of a synopsis for an epic story with a cat as the main character.

Format it in JSON, as shown below.

{

    "title": "",

    "content": "",

}

"""

// Generate text

let input = LLMInput.chat([

    .system("You are a helpful assistant."),

    .user(prompt)

])

for try await text in try await client.textStream(from: input) {

    print(text, terminator: "")

}

```

Using with Apple MLX

```swift

import LocalLLMClient

import LocalLLMClientMLX

import LocalLLMClientUtility

// Download model from Hugging Face

let downloader = FileDownloader(

    source: .huggingFace(id: "mlx-community/Qwen3-1.7B-4bit", globs: .mlx)

)

try await downloader.download { print("Progress: \($0)") }

// Initialize a client with the downloaded model

let client = try await LocalLLMClient.mlx(url: downloader.destination, parameter: .init(

    temperature: 0.7,    // Randomness (0.0 to 1.0)

    topP: 0.9            // Top-P (nucleus) sampling

))

// Generate text

let input = LLMInput.chat([

    .system("You are a helpful assistant."),

    .user("Tell me a story about a cat.")

])

for try await text in try await client.textStream(from: input) {

    print(text, terminator: "")

}

```

Using with Apple FoundationModels

```swift

import LocalLLMClient

import LocalLLMClientFoundationModels

// Available on iOS 26.0+ / macOS 26.0+ and requires Apple Intelligence 

let client = try await LocalLLMClient.foundationModels(

    // Use system's default model

    model: .default,

    // Configure generation options

    parameter: .init(

        temperature: 0.7,

    )

)

// Generate text

let input = LLMInput.chat([

    .system("You are a helpful assistant."),

    .user("Tell me a short story about a clever fox.")

])

for try await text in try await client.textStream(from: input) {

    print(text, terminator: "")

}

```

Advanced Multimodal with llama.cpp

```swift

import LocalLLMClient

import LocalLLMClientLlama

import LocalLLMClientUtility

// Download model from Hugging Face (Gemma 3)

let model = "gemma-3-4b-it-Q8_0.gguf"

let mmproj = "mmproj-model-f16.gguf"

let downloader = FileDownloader(

    source: .huggingFace(id: "ggml-org/gemma-3-4b-it-GGUF", globs: [model, mmproj]),

)

try await downloader.download { print("Download: \($0)") }

// Initialize a client with the downloaded model

let client = try await LocalLLMClient.llama(

    url: downloader.destination.appending(component: model),

    mmprojURL: downloader.destination.appending(component: mmproj)

)

let input = LLMInput.chat([

    .user("What's in this image?", attachments: [.image(.init(resource: .yourImage))]),

])

// Generate text without streaming

print(try await client.generateText(from: input))

```

Advanced Multimodal with Apple MLX

```swift

import LocalLLMClient

import LocalLLMClientMLX

import LocalLLMClientUtility

// Download model from Hugging Face (Qwen2.5 VL)

let downloader = FileDownloader(source: .huggingFace(

    id: "mlx-community/Qwen2.5-VL-3B-Instruct-abliterated-4bit",

    globs: .mlx

))

try await downloader.download { print("Progress: \($0)") }

let client = try await LocalLLMClient.mlx(url: downloader.destination)

let input = LLMInput.chat([

    .user("What's in this image?", attachments: [.image(.init(resource: .yourImage))]),

])

// Generate text without streaming

print(try await client.generateText(from: input))

```

### CLI Tool

You can use LocalLLMClient directly from the terminal using the command line tool:

```bash

# Run using llama.cpp

swift run LocalLLMCLI --model /path/to/your/model.gguf "Your prompt here"

# Run using MLX

./scripts/run_mlx.sh --model https://huggingface.co/mlx-community/Qwen3-1.7B-4bit "Your prompt here"

```

## Tested Models

- LLaMA 3

- Gemma 3 / 2

- Qwen 3 / 2

- Phi 4

> [Models compatible with llama.cpp backend](https://github.com/ggml-org/llama.cpp?tab=readme-ov-file#text-only)  

> [Models compatible with MLX backend](https://github.com/ml-explore/mlx-swift-examples/blob/main/Libraries/MLXLLM/Documentation.docc/Documentation.md)  

*If you have a model that works, please open an issue or PR to add it to the list.*

## Requirements

- iOS 16.0+ / macOS 14.0+

- Xcode 16.0+

## Acknowledgements

This package uses [llama.cpp](https://github.com/ggml-org/llama.cpp), [Apple's MLX](https://opensource.apple.com/projects/mlx/) and [Foundation Models framework](https://developer.apple.com/documentation/foundationmodels) for model inference.

---

[Support this project :heart:](https://github.com/sponsors/tattn)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tattn/localllmclient

Awesome Lists containing this project

README

Advanced Usage: Low Level API