https://github.com/ferranpons/llamatik
True on-device AI for Kotlin Multiplatform (Android, iOS, Desktop, JVM, WASM). LLM, Speech-to-Text and Image Generation — powered by llama.cpp, whisper.cpp and stable-diffusion.cpp.
https://github.com/ferranpons/llamatik
ai android desktop edge-ai ggml inference ios kmp kmp-library kotlin ktor llama llama-cpp llm mobile-ai multiplatform offline-ai on-device-ai privacy rag
Last synced: about 1 month ago
JSON representation
True on-device AI for Kotlin Multiplatform (Android, iOS, Desktop, JVM, WASM). LLM, Speech-to-Text and Image Generation — powered by llama.cpp, whisper.cpp and stable-diffusion.cpp.
- Host: GitHub
- URL: https://github.com/ferranpons/llamatik
- Owner: ferranpons
- License: mit
- Created: 2025-07-17T09:27:56.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2026-04-16T18:00:59.000Z (about 1 month ago)
- Last Synced: 2026-04-16T19:25:30.175Z (about 1 month ago)
- Topics: ai, android, desktop, edge-ai, ggml, inference, ios, kmp, kmp-library, kotlin, ktor, llama, llama-cpp, llm, mobile-ai, multiplatform, offline-ai, on-device-ai, privacy, rag
- Language: HTML
- Homepage: https://www.llamatik.com
- Size: 192 MB
- Stars: 98
- Watchers: 6
- Forks: 18
- Open Issues: 16
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Security: SECURITY.md
Awesome Lists containing this project
README
Llamatik
Run AI locally on Android, iOS, Desktop and WASM — using a single Kotlin API.
Offline-first · Privacy-preserving · True Kotlin Multiplatform
---
## ✨ What is Llamatik?
**Llamatik** is a true Kotlin Multiplatform AI library that lets you run:
- 🧠 **Large Language Models (LLMs)** via `llama.cpp`
- 🎙 **Speech-to-Text (STT)** via `whisper.cpp`
- 🎨 **Image Generation** via `stable-diffusion.cpp`
Fully **on-device**, optionally remote — all behind a **unified Kotlin API**.
No Python.
No required servers.
Your models, your data, your device.
Designed for **privacy-first**, **offline-capable**, and **cross-platform** AI applications.
---
## 🚀 Features

### 🔐 On-device & Private
- ✅ Fully offline inference via **llama.cpp**
- ✅ On-device speech recognition via **whisper.cpp**
- ✅ No network required
- ✅ No data exfiltration
- ✅ Works with **GGUF** (LLMs) and **BIN** (Whisper) models
### 🧠 LLM (llama.cpp)
- ✅ Text generation (non-streaming & streaming)
- ✅ Context-aware generation (system + history)
- ✅ **Schema-constrained JSON generation**
- ✅ Embeddings for vector search & RAG
- ✅ Configurable context length, threads, mmap, Flash Attention
- ✅ KV cache session save / load / continue
- ✅ Fine-grained sampling controls (temperature, top-k, top-p, repeat penalty, max tokens)
### 🎙 Speech-to-Text (whisper.cpp)
- ✅ On-device transcription
- ✅ Works fully offline
- ✅ 16kHz mono WAV support
- ✅ Selectable Whisper models
- ✅ Integrated model download + management
### 🎨 Image Generation (stable-diffusion.cpp)
- ✅ On-device Stable Diffusion inference
- ✅ Text-to-image generation
- ✅ Fully offline
- ✅ Works with optimized SD models
- ✅ Native C++ integration
### 🧩 Kotlin Multiplatform
- ✅ Shared API across **Android, iOS, Desktop**
- ✅ Native C++ integration via Kotlin/Native
- ✅ Static frameworks for iOS
- ✅ JNI for Desktop
### 🌐 Hybrid & Remote
- ✅ Optional HTTP client for remote inference
- ✅ Drop-in backend server (`llamatik-backend`)
- ✅ Seamlessly switch between local and remote inference
---
## 📱 Try it now (No setup required)
Want to see Llamatik in action before integrating it?
The **Llamatik App** showcases:
- On-device inference
- Streaming generation
- Speech-to-text (Whisper)
- Privacy-first AI (no cloud required)
- Downloadable models
---
## 🔧 Use Cases
- 🧠 On-device chatbots & assistants
- 📚 Local RAG systems
- 🛰️ Hybrid AI apps (offline-first, online fallback)
- 🎮 Game AI & procedural dialogue
---
## 🧱 Architecture (WIP)
```
Your App
│
▼
LlamaBridge (shared Kotlin API)
│
├─ llamatik-core → Native llama.cpp, whisper.cpp and stablediffusion.cpp (on-device)
├─ llamatik-client → Remote HTTP inference
└─ llamatik-backend → llama.cpp-compatible server
```
Switching between **local and remote inference requires no API changes** —
only configuration.
---
## 🔧 Requirements
- iOS Deployment Target: **16.6+**
- Android MinSDK API: **26**
- Desktop: JVM 21+
- WASM: Modern browser with WebAssembly support
## 📦 Current Versions
- llama.cpp version: [b8816](https://github.com/ggml-org/llama.cpp/releases/tag/b8816)
- whisper.cpp version [v1.8.4](https://github.com/ggml-org/whisper.cpp/releases/tag/v1.8.4)
- stablediffusion.cpp version [master-572-1b4e9be](https://github.com/leejet/stable-diffusion.cpp/releases/tag/master-572-1b4e9be)
---
## 📦 Installation
Llamatik is published on **Maven Central** and follows **semantic versioning**.
- No custom Gradle plugins
- No manual native toolchain setup
- Works with standard Kotlin Multiplatform projects
### Repository setup
```kotlin
dependencyResolutionManagement {
repositories {
google()
mavenCentral()
}
}
commonMain.dependencies {
implementation("com.llamatik:library:1.0.0")
}
```
---
## ⚡ Quick Start
```kotlin
// Resolve model path (place GGUF in assets / bundle)
val modelPath = LlamaBridge.getModelPath("phi-2.Q4_0.gguf")
// (Optional) tune parameters before loading — contextLength/useMmap/flashAttention
// take effect at model init time; the others can be changed at any time
LlamaBridge.updateGenerateParams(
temperature = 0.7f,
maxTokens = 512,
topP = 0.95f,
topK = 40,
repeatPenalty = 1.1f,
contextLength = 4096,
numThreads = 4,
useMmap = true,
flashAttention = false,
)
// Load model
LlamaBridge.initGenerateModel(modelPath)
// Generate text
val output = LlamaBridge.generate(
"Explain Kotlin Multiplatform in one sentence."
)
```
---
## 🧑💻 Library Usage
The public Kotlin API is defined in `LlamaBridge` (an `expect object` with platform-specific `actual` implementations).
### API surface (LlamaBridge)
```kotlin
@Suppress("EXPECT_ACTUAL_CLASSIFIERS_ARE_IN_BETA_WARNING")
expect object LlamaBridge {
// Utilities
fun getModelPath(modelFileName: String): String // copy asset/bundle model to app files dir and return absolute path
fun shutdown() // free native resources
// Embeddings
fun initEmbedModel(modelPath: String): Boolean // load embeddings model
fun embed(input: String): FloatArray // return embedding vector
// Text generation (non-streaming)
fun initGenerateModel(modelPath: String): Boolean // load generation model
fun generate(prompt: String): String
fun generateWithContext(
systemPrompt: String,
contextBlock: String,
userPrompt: String
): String
// Text generation (streaming)
fun generateStream(prompt: String, callback: GenStream)
fun generateStreamWithContext(
systemPrompt: String,
contextBlock: String,
userPrompt: String,
callback: GenStream
)
// Convenience streaming overload (lambda callbacks)
fun generateWithContextStream(
system: String,
context: String,
user: String,
onDelta: (String) -> Unit,
onDone: () -> Unit,
onError: (String) -> Unit
)
// Text generation with JSON schema (non-streaming)
fun generateJson(prompt: String, jsonSchema: String? = null): String
fun generateJsonWithContext(
systemPrompt: String,
contextBlock: String,
userPrompt: String,
jsonSchema: String? = null
): String
// Text generation with JSON schema (streaming)
fun generateJsonStream(prompt: String, jsonSchema: String? = null, callback: GenStream)
fun generateJsonStreamWithContext(
systemPrompt: String,
contextBlock: String,
userPrompt: String,
jsonSchema: String? = null,
callback: GenStream
)
// KV cache session support
fun sessionReset(): Boolean // clear KV state, keep model loaded
fun sessionSave(path: String): Boolean // persist KV state to file
fun sessionLoad(path: String): Boolean // restore KV state from file
fun generateContinue(prompt: String): String // generate using existing KV cache
// Generation parameters (applied on next generate call)
fun updateGenerateParams(
temperature: Float, // randomness (0.0–2.0)
maxTokens: Int, // max output tokens
topP: Float, // nucleus sampling threshold
topK: Int, // top-k sampling
repeatPenalty: Float, // penalty for repeated tokens
contextLength: Int, // KV context window size (requires model reload)
numThreads: Int, // CPU threads for inference
useMmap: Boolean, // memory-map model weights (requires model reload)
flashAttention: Boolean, // enable Flash Attention (requires model reload)
batchSize: Int, // token batch size for prompt processing (requires model reload)
)
fun nativeCancelGenerate() // cancel ongoing generation
}
interface GenStream {
fun onDelta(text: String)
fun onComplete()
fun onError(message: String)
}
```
### Generation Parameters
All sampling and hardware parameters are set via `updateGenerateParams`. Parameters that affect model loading (`contextLength`, `useMmap`, `flashAttention`, `numThreads`) must be set **before** calling `initGenerateModel` to take effect — the others can be updated at any time.
| Parameter | Default | Description |
|---|---|---|
| `temperature` | `0.7` | Randomness of outputs (0 = deterministic, 2 = very random) |
| `maxTokens` | `256` | Maximum number of tokens to generate |
| `topP` | `0.95` | Nucleus sampling: keep tokens covering this probability mass |
| `topK` | `40` | Only sample from the top-K most likely tokens |
| `repeatPenalty` | `1.1` | Penalty multiplier for recently generated tokens |
| `contextLength` | `4096` | KV cache window size in tokens *(reload required)* |
| `numThreads` | `4` | CPU threads used for inference *(reload required)* |
| `useMmap` | `true` | Memory-map model weights instead of loading into RAM *(reload required)* |
| `flashAttention` | `false` | Enable Flash Attention for faster, more memory-efficient attention *(reload required)* |
| `batchSize` | `512` | Token batch size for prompt processing — larger = faster prefill, more RAM *(reload required)* |
### KV Cache Sessions
Use the session API to persist and resume conversation state across calls without re-feeding the full prompt:
```kotlin
// Generate and keep the KV state in memory
LlamaBridge.generate("Tell me about Kotlin.")
// Save the KV state to disk
LlamaBridge.sessionSave("/path/to/session.bin")
// ... later or in a new process ...
// Restore state and continue from where you left off
LlamaBridge.sessionLoad("/path/to/session.bin")
val continuation = LlamaBridge.generateContinue("What about multiplatform support?")
// Reset state without unloading the model
LlamaBridge.sessionReset()
```
### Speech-to-Text (WhisperBridge)
WhisperBridge exposes a small, platform-friendly wrapper around whisper.cpp for on-device speech-to-text.
The workflow is:
1. Download a Whisper ggml model (e.g. ggml-tiny-q8_0.bin) to local storage (the app does this for you).
2. Initialize Whisper once with the local model path.
3. Record audio to a WAV file and transcribe it.
### Whisper API surface
```kotlin
object WhisperBridge {
/** Returns a platform-specific absolute path for the model filename. */
fun getModelPath(modelFileName: String): String
/** Loads the model at [modelPath]. Returns true if loaded. */
fun initModel(modelPath: String): Boolean
/**
* Transcribes a WAV file and returns text.
* Tip: record WAV as 16 kHz, mono, 16-bit PCM for best compatibility.
*
* @param initialPrompt Optional text prepended to the decoder input (up to 224 tokens).
* Use it to bias transcription toward domain-specific vocabulary (e.g. medical terms).
*/
fun transcribeWav(wavPath: String, language: String? = null, initialPrompt: String? = null): String
/** Frees native resources. */
fun release()
}
```
#### Example
```kotlin
import com.llamatik.library.platform.WhisperBridge
val modelPath = WhisperBridge.getModelPath("ggml-tiny-q8_0.bin")
// 1) Init once (e.g. app start)
WhisperBridge.initModel(modelPath)
// 2) Record to a WAV file (16kHz mono PCM16) using your own recorder
val wavPath: String = "/path/to/recording.wav"
// 3) Transcribe
val text = WhisperBridge.transcribeWav(wavPath, language = null).trim()
println(text)
// 4) Optional: release on app shutdown
WhisperBridge.release()
```
**Note**: WhisperBridge expects a WAV file path. Llamatik’s app uses AudioRecorder + AudioPaths.tempWavPath() to generate the WAV before calling transcribeWav(...).
### 🎨 Image Generation (StableDiffusionBridge)
Llamatik exposes Stable Diffusion through StableDiffusionBridge.
Workflow
1. Download or bundle a Stable Diffusion model.
2. Initialize once.
3. Generate images from text prompts.
### Stable-Diffusion API surface
```kotlin
object StableDiffusionBridge {
/** Returns absolute model path (copied from assets/bundle if needed). */
fun getModelPath(modelFileName: String): String
/** Loads the Stable Diffusion model. */
fun initModel(modelPath: String): Boolean
/**
* Generates an image from a prompt.
*
* @param prompt Text prompt
* @param width Output width
* @param height Output height
* @param steps Inference steps
* @param cfgScale Guidance scale
* @return PNG image as ByteArray
*/
fun generateImage(
prompt: String,
width: Int = 512,
height: Int = 512,
steps: Int = 20,
cfgScale: Float = 7.5f
): ByteArray
/** Releases native resources */
fun release()
}
```
#### Example
```kotlin
import com.llamatik.library.platform.StableDiffusionBridge
val modelPath = StableDiffusionBridge.getModelPath("sd-model.bin")
StableDiffusionBridge.initModel(modelPath)
val imageBytes = StableDiffusionBridge.generateImage(
prompt = "A cyberpunk llama in neon Tokyo",
width = 512,
height = 512
)
// Save imageBytes as PNG file
```
### 👁️ Vision / Multimodal (MultimodalBridge)
MultimodalBridge wraps llama.cpp's multimodal (VLM) support for on-device image analysis using vision-language models such as SmolVLM.
The workflow is:
1. Download a VLM GGUF model and its matching mmproj GGUF file to local storage.
2. Initialize the bridge once with both file paths.
3. Pass image bytes (JPEG/PNG/BMP) and a text prompt to receive a streamed response.
### MultimodalBridge API surface
```kotlin
object MultimodalBridge {
/**
* Load the vision model and its multimodal projector (mmproj) side-by-side.
* Both files must be available on disk before calling this.
*
* @param modelPath Absolute path to the GGUF vision model.
* @param mmprojPath Absolute path to the GGUF mmproj file.
* @return true on success.
*/
fun initModel(modelPath: String, mmprojPath: String): Boolean
/**
* Analyze an image given as raw bytes (JPEG/PNG/BMP), streaming the response
* token by token via [callback].
*
* Must be called from a background thread/coroutine; blocks until generation completes.
*/
fun analyzeImageBytesStream(imageBytes: ByteArray, prompt: String, callback: GenStream)
/** Cancel an in-progress analyzeImageBytesStream call. */
fun cancelAnalysis()
/** Free all native resources (model, mmproj context, llama context). */
fun release()
}
```
#### Example
```kotlin
import com.llamatik.library.platform.MultimodalBridge
// 1) Init once — both model and mmproj must be downloaded first
val loaded = MultimodalBridge.initModel(
modelPath = "/path/to/SmolVLM-256M-Instruct-Q8_0.gguf",
mmprojPath = "/path/to/mmproj-SmolVLM-256M-Instruct-f16.gguf"
)
// 2) Analyze an image (e.g. loaded from disk or camera)
val imageBytes: ByteArray = File("/path/to/photo.jpg").readBytes()
MultimodalBridge.analyzeImageBytesStream(
imageBytes = imageBytes,
prompt = "Describe what you see in this image.",
callback = object : GenStream {
override fun onDelta(text: String) { print(text) }
override fun onComplete() { println("\n[done]") }
override fun onError(message: String){ println("Error: $message") }
}
)
// 3) Optional: cancel mid-stream
MultimodalBridge.cancelAnalysis()
// 4) Optional: release on app shutdown
MultimodalBridge.release()
```
**Note**: MultimodalBridge requires both a vision model GGUF **and** a matching mmproj GGUF. Llamatik's app downloads both automatically when you select a VLM model.
---
## 🧑💻 Backend Usage
The Llamatik backend server is now maintained in a dedicated repository.
👉 Llamatik Server Repository
[https://github.com/ferranpons/Llamatik-Server](https://github.com/ferranpons/Llamatik-Server)
Visit the repository for full setup instructions, configuration options, and usage details.
---
## 🔍 Why Llamatik?
- ✅ Built directly on llama.cpp, whisper.cpp and stable-diffusion.cpp
- ✅ Offline-first & privacy-preserving
- ✅ No runtime dependencies
- ✅ Open-source (MIT)
- ✅ Used by real Android & iOS apps
- ✅ Designed for long-term Kotlin Multiplatform support
---
## 📦 Apps using Llamatik
Llamatik is already used in production apps on Google Play and App Store.
Want to showcase your app here?
Open a PR and add it to the list 🚀
---
## 🤝 Contributing
Llamatik is 100% open-source and actively developed.
- Bug reports
- Feature requests
- Documentation improvements
- Platform extensions
All contributions are welcome!
---
## 📜 License
This project is licensed under the MIT License.
See [LICENSE](./LICENSE) for details.
---
Built with ❤️ for the Kotlin community.

