https://github.com/ferranpons/llamatik

True on-device AI for Kotlin Multiplatform (Android, iOS, Desktop, JVM, WASM). LLM, Speech-to-Text and Image Generation — powered by llama.cpp, whisper.cpp and stable-diffusion.cpp.
https://github.com/ferranpons/llamatik

ai android desktop edge-ai ggml inference ios kmp kmp-library kotlin ktor llama llama-cpp llm mobile-ai multiplatform offline-ai on-device-ai privacy rag

Last synced: about 1 month ago
JSON representation

True on-device AI for Kotlin Multiplatform (Android, iOS, Desktop, JVM, WASM). LLM, Speech-to-Text and Image Generation — powered by llama.cpp, whisper.cpp and stable-diffusion.cpp.

Host: GitHub
URL: https://github.com/ferranpons/llamatik
Owner: ferranpons
License: mit
Created: 2025-07-17T09:27:56.000Z (10 months ago)
Default Branch: main
Last Pushed: 2026-04-16T18:00:59.000Z (about 1 month ago)
Last Synced: 2026-04-16T19:25:30.175Z (about 1 month ago)
Topics: ai, android, desktop, edge-ai, ggml, inference, ios, kmp, kmp-library, kotlin, ktor, llama, llama-cpp, llm, mobile-ai, multiplatform, offline-ai, on-device-ai, privacy, rag
Language: HTML
Homepage: https://www.llamatik.com
Size: 192 MB
Stars: 98
Watchers: 6
Forks: 18
Open Issues: 16
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Security: SECURITY.md

Awesome Lists containing this project

README

Llamatik

Run AI locally on Android, iOS, Desktop and WASM — using a single Kotlin API.

Offline-first · Privacy-preserving · True Kotlin Multiplatform

---

## ✨ What is Llamatik?

**Llamatik** is a true Kotlin Multiplatform AI library that lets you run:

- 🧠 **Large Language Models (LLMs)** via `llama.cpp`
- 🎙 **Speech-to-Text (STT)** via `whisper.cpp`
- 🎨 **Image Generation** via `stable-diffusion.cpp`

Fully **on-device**, optionally remote — all behind a **unified Kotlin API**.

No Python.
No required servers.
Your models, your data, your device.

Designed for **privacy-first**, **offline-capable**, and **cross-platform** AI applications.

---

## 🚀 Features

### 🔐 On-device & Private
- ✅ Fully offline inference via **llama.cpp**
- ✅ On-device speech recognition via **whisper.cpp**
- ✅ No network required
- ✅ No data exfiltration
- ✅ Works with **GGUF** (LLMs) and **BIN** (Whisper) models

### 🧠 LLM (llama.cpp)
- ✅ Text generation (non-streaming & streaming)
- ✅ Context-aware generation (system + history)
- ✅ **Schema-constrained JSON generation**
- ✅ Embeddings for vector search & RAG
- ✅ Configurable context length, threads, mmap, Flash Attention
- ✅ KV cache session save / load / continue
- ✅ Fine-grained sampling controls (temperature, top-k, top-p, repeat penalty, max tokens)

### 🎙 Speech-to-Text (whisper.cpp)
- ✅ On-device transcription
- ✅ Works fully offline
- ✅ 16kHz mono WAV support
- ✅ Selectable Whisper models
- ✅ Integrated model download + management

### 🎨 Image Generation (stable-diffusion.cpp)

- ✅ On-device Stable Diffusion inference
- ✅ Text-to-image generation
- ✅ Fully offline
- ✅ Works with optimized SD models
- ✅ Native C++ integration

### 🧩 Kotlin Multiplatform
- ✅ Shared API across **Android, iOS, Desktop**
- ✅ Native C++ integration via Kotlin/Native
- ✅ Static frameworks for iOS
- ✅ JNI for Desktop

### 🌐 Hybrid & Remote
- ✅ Optional HTTP client for remote inference
- ✅ Drop-in backend server (`llamatik-backend`)
- ✅ Seamlessly switch between local and remote inference

---

## 📱 Try it now (No setup required)

Want to see Llamatik in action before integrating it?

The **Llamatik App** showcases:
- On-device inference
- Streaming generation
- Speech-to-text (Whisper)
- Privacy-first AI (no cloud required)
- Downloadable models

---

## 🔧 Use Cases

- 🧠 On-device chatbots & assistants
- 📚 Local RAG systems
- 🛰️ Hybrid AI apps (offline-first, online fallback)
- 🎮 Game AI & procedural dialogue

---

## 🧱 Architecture (WIP)

```
Your App
│
▼
LlamaBridge (shared Kotlin API)
│
├─ llamatik-core → Native llama.cpp, whisper.cpp and stablediffusion.cpp (on-device)
├─ llamatik-client → Remote HTTP inference
└─ llamatik-backend → llama.cpp-compatible server
```

Switching between **local and remote inference requires no API changes** —
only configuration.

---

## 🔧 Requirements

- iOS Deployment Target: **16.6+**
- Android MinSDK API: **26**
- Desktop: JVM 21+
- WASM: Modern browser with WebAssembly support

## 📦 Current Versions

- llama.cpp version: [b8816](https://github.com/ggml-org/llama.cpp/releases/tag/b8816)
- whisper.cpp version [v1.8.4](https://github.com/ggml-org/whisper.cpp/releases/tag/v1.8.4)
- stablediffusion.cpp version [master-572-1b4e9be](https://github.com/leejet/stable-diffusion.cpp/releases/tag/master-572-1b4e9be)

---

## 📦 Installation

Llamatik is published on **Maven Central** and follows **semantic versioning**.

- No custom Gradle plugins
- No manual native toolchain setup
- Works with standard Kotlin Multiplatform projects

### Repository setup

```kotlin
dependencyResolutionManagement {
repositories {
google()
mavenCentral()
}
}

commonMain.dependencies {
implementation("com.llamatik:library:1.0.0")
}
```

---

## ⚡ Quick Start

```kotlin
// Resolve model path (place GGUF in assets / bundle)
val modelPath = LlamaBridge.getModelPath("phi-2.Q4_0.gguf")

// (Optional) tune parameters before loading — contextLength/useMmap/flashAttention
// take effect at model init time; the others can be changed at any time
LlamaBridge.updateGenerateParams(
temperature = 0.7f,
maxTokens = 512,
topP = 0.95f,
topK = 40,
repeatPenalty = 1.1f,
contextLength = 4096,
numThreads = 4,
useMmap = true,
flashAttention = false,
)

// Load model
LlamaBridge.initGenerateModel(modelPath)

// Generate text
val output = LlamaBridge.generate(
"Explain Kotlin Multiplatform in one sentence."
)
```

---

## 🧑‍💻 Library Usage

The public Kotlin API is defined in `LlamaBridge` (an `expect object` with platform-specific `actual` implementations).

### API surface (LlamaBridge)

```kotlin
@Suppress("EXPECT_ACTUAL_CLASSIFIERS_ARE_IN_BETA_WARNING")
expect object LlamaBridge {
// Utilities
fun getModelPath(modelFileName: String): String // copy asset/bundle model to app files dir and return absolute path
fun shutdown() // free native resources

// Embeddings
fun initEmbedModel(modelPath: String): Boolean // load embeddings model
fun embed(input: String): FloatArray // return embedding vector

// Text generation (non-streaming)
fun initGenerateModel(modelPath: String): Boolean // load generation model
fun generate(prompt: String): String
fun generateWithContext(
systemPrompt: String,
contextBlock: String,
userPrompt: String
): String

// Text generation (streaming)
fun generateStream(prompt: String, callback: GenStream)
fun generateStreamWithContext(
systemPrompt: String,
contextBlock: String,
userPrompt: String,
callback: GenStream
)

// Convenience streaming overload (lambda callbacks)
fun generateWithContextStream(
system: String,
context: String,
user: String,
onDelta: (String) -> Unit,
onDone: () -> Unit,
onError: (String) -> Unit
)

// Text generation with JSON schema (non-streaming)
fun generateJson(prompt: String, jsonSchema: String? = null): String
fun generateJsonWithContext(
systemPrompt: String,
contextBlock: String,
userPrompt: String,
jsonSchema: String? = null
): String

// Text generation with JSON schema (streaming)
fun generateJsonStream(prompt: String, jsonSchema: String? = null, callback: GenStream)
fun generateJsonStreamWithContext(
systemPrompt: String,
contextBlock: String,
userPrompt: String,
jsonSchema: String? = null,
callback: GenStream
)

// KV cache session support
fun sessionReset(): Boolean // clear KV state, keep model loaded
fun sessionSave(path: String): Boolean // persist KV state to file
fun sessionLoad(path: String): Boolean // restore KV state from file
fun generateContinue(prompt: String): String // generate using existing KV cache

// Generation parameters (applied on next generate call)
fun updateGenerateParams(
temperature: Float, // randomness (0.0–2.0)
maxTokens: Int, // max output tokens
topP: Float, // nucleus sampling threshold
topK: Int, // top-k sampling
repeatPenalty: Float, // penalty for repeated tokens
contextLength: Int, // KV context window size (requires model reload)
numThreads: Int, // CPU threads for inference
useMmap: Boolean, // memory-map model weights (requires model reload)
flashAttention: Boolean, // enable Flash Attention (requires model reload)
batchSize: Int, // token batch size for prompt processing (requires model reload)
)

fun nativeCancelGenerate() // cancel ongoing generation
}

interface GenStream {
fun onDelta(text: String)
fun onComplete()
fun onError(message: String)
}
```

### Generation Parameters

All sampling and hardware parameters are set via `updateGenerateParams`. Parameters that affect model loading (`contextLength`, `useMmap`, `flashAttention`, `numThreads`) must be set **before** calling `initGenerateModel` to take effect — the others can be updated at any time.

| Parameter | Default | Description |
|---|---|---|
| `temperature` | `0.7` | Randomness of outputs (0 = deterministic, 2 = very random) |
| `maxTokens` | `256` | Maximum number of tokens to generate |
| `topP` | `0.95` | Nucleus sampling: keep tokens covering this probability mass |
| `topK` | `40` | Only sample from the top-K most likely tokens |
| `repeatPenalty` | `1.1` | Penalty multiplier for recently generated tokens |
| `contextLength` | `4096` | KV cache window size in tokens *(reload required)* |
| `numThreads` | `4` | CPU threads used for inference *(reload required)* |
| `useMmap` | `true` | Memory-map model weights instead of loading into RAM *(reload required)* |
| `flashAttention` | `false` | Enable Flash Attention for faster, more memory-efficient attention *(reload required)* |
| `batchSize` | `512` | Token batch size for prompt processing — larger = faster prefill, more RAM *(reload required)* |

### KV Cache Sessions

Use the session API to persist and resume conversation state across calls without re-feeding the full prompt:

```kotlin
// Generate and keep the KV state in memory
LlamaBridge.generate("Tell me about Kotlin.")

// Save the KV state to disk
LlamaBridge.sessionSave("/path/to/session.bin")

// ... later or in a new process ...

// Restore state and continue from where you left off
LlamaBridge.sessionLoad("/path/to/session.bin")
val continuation = LlamaBridge.generateContinue("What about multiplatform support?")

// Reset state without unloading the model
LlamaBridge.sessionReset()
```

### Speech-to-Text (WhisperBridge)

WhisperBridge exposes a small, platform-friendly wrapper around whisper.cpp for on-device speech-to-text.

The workflow is:
1. Download a Whisper ggml model (e.g. ggml-tiny-q8_0.bin) to local storage (the app does this for you).
2. Initialize Whisper once with the local model path.
3. Record audio to a WAV file and transcribe it.

### Whisper API surface

```kotlin
object WhisperBridge {
/** Returns a platform-specific absolute path for the model filename. */
fun getModelPath(modelFileName: String): String

/** Loads the model at [modelPath]. Returns true if loaded. */
fun initModel(modelPath: String): Boolean

/**
* Transcribes a WAV file and returns text.
* Tip: record WAV as 16 kHz, mono, 16-bit PCM for best compatibility.
*
* @param initialPrompt Optional text prepended to the decoder input (up to 224 tokens).
* Use it to bias transcription toward domain-specific vocabulary (e.g. medical terms).
*/
fun transcribeWav(wavPath: String, language: String? = null, initialPrompt: String? = null): String

/** Frees native resources. */
fun release()
}
```

#### Example

```kotlin
import com.llamatik.library.platform.WhisperBridge

val modelPath = WhisperBridge.getModelPath("ggml-tiny-q8_0.bin")

// 1) Init once (e.g. app start)
WhisperBridge.initModel(modelPath)

// 2) Record to a WAV file (16kHz mono PCM16) using your own recorder
val wavPath: String = "/path/to/recording.wav"

// 3) Transcribe
val text = WhisperBridge.transcribeWav(wavPath, language = null).trim()
println(text)

// 4) Optional: release on app shutdown
WhisperBridge.release()
```

**Note**: WhisperBridge expects a WAV file path. Llamatik’s app uses AudioRecorder + AudioPaths.tempWavPath() to generate the WAV before calling transcribeWav(...).

### 🎨 Image Generation (StableDiffusionBridge)

Llamatik exposes Stable Diffusion through StableDiffusionBridge.

Workflow
1. Download or bundle a Stable Diffusion model.
2. Initialize once.
3. Generate images from text prompts.

### Stable-Diffusion API surface

```kotlin
object StableDiffusionBridge {

/** Returns absolute model path (copied from assets/bundle if needed). */
fun getModelPath(modelFileName: String): String

/** Loads the Stable Diffusion model. */
fun initModel(modelPath: String): Boolean

/**
* Generates an image from a prompt.
*
* @param prompt Text prompt
* @param width Output width
* @param height Output height
* @param steps Inference steps
* @param cfgScale Guidance scale
* @return PNG image as ByteArray
*/
fun generateImage(
prompt: String,
width: Int = 512,
height: Int = 512,
steps: Int = 20,
cfgScale: Float = 7.5f
): ByteArray

/** Releases native resources */
fun release()
}
```

#### Example

```kotlin
import com.llamatik.library.platform.StableDiffusionBridge

val modelPath = StableDiffusionBridge.getModelPath("sd-model.bin")

StableDiffusionBridge.initModel(modelPath)

val imageBytes = StableDiffusionBridge.generateImage(
prompt = "A cyberpunk llama in neon Tokyo",
width = 512,
height = 512
)

// Save imageBytes as PNG file
```

### 👁️ Vision / Multimodal (MultimodalBridge)

MultimodalBridge wraps llama.cpp's multimodal (VLM) support for on-device image analysis using vision-language models such as SmolVLM.

The workflow is:
1. Download a VLM GGUF model and its matching mmproj GGUF file to local storage.
2. Initialize the bridge once with both file paths.
3. Pass image bytes (JPEG/PNG/BMP) and a text prompt to receive a streamed response.

### MultimodalBridge API surface

```kotlin
object MultimodalBridge {
/**
* Load the vision model and its multimodal projector (mmproj) side-by-side.
* Both files must be available on disk before calling this.
*
* @param modelPath Absolute path to the GGUF vision model.
* @param mmprojPath Absolute path to the GGUF mmproj file.
* @return true on success.
*/
fun initModel(modelPath: String, mmprojPath: String): Boolean

/**
* Analyze an image given as raw bytes (JPEG/PNG/BMP), streaming the response
* token by token via [callback].
*
* Must be called from a background thread/coroutine; blocks until generation completes.
*/
fun analyzeImageBytesStream(imageBytes: ByteArray, prompt: String, callback: GenStream)

/** Cancel an in-progress analyzeImageBytesStream call. */
fun cancelAnalysis()

/** Free all native resources (model, mmproj context, llama context). */
fun release()
}
```

#### Example

```kotlin
import com.llamatik.library.platform.MultimodalBridge

// 1) Init once — both model and mmproj must be downloaded first
val loaded = MultimodalBridge.initModel(
modelPath = "/path/to/SmolVLM-256M-Instruct-Q8_0.gguf",
mmprojPath = "/path/to/mmproj-SmolVLM-256M-Instruct-f16.gguf"
)

// 2) Analyze an image (e.g. loaded from disk or camera)
val imageBytes: ByteArray = File("/path/to/photo.jpg").readBytes()

MultimodalBridge.analyzeImageBytesStream(
imageBytes = imageBytes,
prompt = "Describe what you see in this image.",
callback = object : GenStream {
override fun onDelta(text: String) { print(text) }
override fun onComplete() { println("\n[done]") }
override fun onError(message: String){ println("Error: $message") }
}
)

// 3) Optional: cancel mid-stream
MultimodalBridge.cancelAnalysis()

// 4) Optional: release on app shutdown
MultimodalBridge.release()
```

**Note**: MultimodalBridge requires both a vision model GGUF **and** a matching mmproj GGUF. Llamatik's app downloads both automatically when you select a VLM model.

---

## 🧑‍💻 Backend Usage

The Llamatik backend server is now maintained in a dedicated repository.

👉 Llamatik Server Repository
[https://github.com/ferranpons/Llamatik-Server](https://github.com/ferranpons/Llamatik-Server)

Visit the repository for full setup instructions, configuration options, and usage details.

---

## 🔍 Why Llamatik?

- ✅ Built directly on llama.cpp, whisper.cpp and stable-diffusion.cpp
- ✅ Offline-first & privacy-preserving
- ✅ No runtime dependencies
- ✅ Open-source (MIT)
- ✅ Used by real Android & iOS apps
- ✅ Designed for long-term Kotlin Multiplatform support

---

## 📦 Apps using Llamatik

Llamatik is already used in production apps on Google Play and App Store.

Want to showcase your app here?
Open a PR and add it to the list 🚀

---

## 🤝 Contributing

Llamatik is 100% open-source and actively developed.
- Bug reports
- Feature requests
- Documentation improvements
- Platform extensions

All contributions are welcome!

---

## 📜 License

This project is licensed under the MIT License.

See [LICENSE](./LICENSE) for details.

---

Built with ❤️ for the Kotlin community.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ferranpons/llamatik

Awesome Lists containing this project

README

Llamatik