https://github.com/dineshsoudagar/local-llms-on-android
Run large language models like Qwen and LLaMA locally on Android for offline, private, real-time question answering and chat - powered by ONNX Runtime.
https://github.com/dineshsoudagar/local-llms-on-android
android android-app chatbot huggingface-tokenizers llama3 local-llm local-llm-integration mobile-ai offline-inference on-device-ai onnx-runtime qwen
Last synced: 2 months ago
JSON representation
Run large language models like Qwen and LLaMA locally on Android for offline, private, real-time question answering and chat - powered by ONNX Runtime.
- Host: GitHub
- URL: https://github.com/dineshsoudagar/local-llms-on-android
- Owner: dineshsoudagar
- License: mit
- Created: 2025-05-04T15:04:09.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2026-04-10T11:57:00.000Z (3 months ago)
- Last Synced: 2026-04-10T20:49:45.075Z (3 months ago)
- Topics: android, android-app, chatbot, huggingface-tokenizers, llama3, local-llm, local-llm-integration, mobile-ai, offline-inference, on-device-ai, onnx-runtime, qwen
- Language: Kotlin
- Homepage:
- Size: 23.3 MB
- Stars: 121
- Watchers: 6
- Forks: 16
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 🤖 Pocket LLM for Android (Offline, Private & Fast)
An Android application that brings local LLM chat, voice input, image input, OCR, and camera-based prompting to your phone.
Pocket LLM runs fully on device after model download. It supports ONNX-based Qwen models, LiteRT-based Qwen 3 and Gemma 4 models, streaming responses, persistent local chat history, markdown-rendered replies, downloadable models, in-app model switching, editable model instructions, and multiple image input workflows.
The app ships as a small base APK. Users download only the models they want, switch between them inside the app, and delete unused models later to save device storage.
---
[](https://github.com/dineshsoudagar/local-llms-on-android/releases)
---
## 🆕 New in v1.5.0
Pocket LLM now supports richer local input workflows beyond text chat.
- 🎙️ Added voice input for faster prompting
- 🖼️ Added image input with OCR, Gemma direct image input, and FastVLM image description support
- 📷 Added camera capture with retake, crop, and photo review
- 🗂️ Added a side panel for quick access to previous chats
- 🗑️ Added easier chat deletion from the history panel
- 💾 Added downloaded model deletion to free device storage
- ⚙️ Added editable model instructions with presets and custom prompts
- 🎨 Added dark mode, light mode, accent colors, and chat font-size control
- 📋 Added copy button for assistant responses
#### ➡️ [See all releases](https://github.com/dineshsoudagar/local-llms-on-android/releases)
---
### 🔗 Also Check Out
**[local-document-intelligence](https://github.com/dineshsoudagar/local-document-intelligence)**
A privacy-first offline document intelligence system with persistent local RAG, hybrid retrieval, and source-grounded answers.
---
## ✨ Features
- 📱 Fully on-device LLM chat for private offline use
- 🎙️ Voice input for faster prompting
- 🖼️ Image input with OCR, Gemma vision, and FastVLM support
- 📷 Camera capture with retake, crop, and photo review
- 💬 Persistent multi-turn chat with local history
- 📦 Download, switch, and delete models inside the app
- 🧠 Supports Qwen2.5, Qwen3, Qwen3 LiteRT, and Gemma 4 LiteRT models
- ⚡ ONNX and LiteRT backend support
- 🎛️ Editable model instructions with presets and custom prompts
- 🎨 Light mode, dark mode, accent colors, and adjustable chat font size
- 🔐 Offline after model download, with no telemetry
---
## 📸 Inference Preview

Chat Inference

Image Support

New UI
Figure: Pocket LLM showing offline chat, image input, and the updated Android UI.
---
## 📦 Download APK - v1.5.0
The app ships as a **single smaller base APK**.
#### ➡️ [Download APK](https://github.com/dineshsoudagar/local-llms-on-android/releases/download/v1.5.0/pocket_llm_v1.5.0.apk)
Models are **not bundled inside the APK**. After installation, choose and download the models you want directly on device.
You can download **multiple models**, switch between them inside the app, and delete unused downloaded models later to free storage.
### Available chat models
- **Gemma 4 E4B LiteRT** - Best for **flagship mobiles**
- **Gemma 4 E2B LiteRT** - Best for **decent to mid-range mobiles**
- **Qwen3 0.6B LiteRT** - Best for **low-end mobiles**
- **Qwen3 0.6B Q4F16 ONNX** - Good for **low to mid-range mobiles**
- **Qwen2.5 0.5B ONNX** - Best for **mid to high-end mobiles**, **full precision**
### Image input support
- **OCR mode** - Extract text from images
- **Gemma vision mode** - Use Gemma direct image input on supported models
- **FastVLM mode** - Use lightweight image description for non-Gemma models
- **Camera capture** - Take a photo, retake, crop, review, and send it as input
> Note: internet is required only for downloading models. Chat, OCR, image input, camera workflows, and inference remain fully on-device after the required models are installed.
---
## 🧠 Backend Support
This app supports **ONNX-based Qwen models** and **LiteRT-based Qwen 3 and Gemma 4 models**.
### Backend overview
- **ONNX backend**: supports **Qwen2.5** and **Qwen3**
- **LiteRT backend**: supports **Qwen3** and **Gemma 4**
### Thinking Mode
- **Qwen3** and **Gemma 4** support **Thinking Mode**
- The toggle is shown only for models that support it
---
## 🚀 Why LiteRT
**LiteRT** is a strong fit for fast local Android chat because:
- It is designed for **high-performance on-device LLM deployment**
- It supports **hardware acceleration**, including **GPU and NPU acceleration** on supported devices
- It helps reduce startup and generation latency for local chat workloads
- It expands the range of practical Android model builds beyond a single backend path
- It fits well with a privacy-first app design focused on fully offline usage
> Note: model capability and performance still depend on the specific model build and the hardware of the target Android device.
---
## ⚙️ Requirements
- [Android Studio](https://developer.android.com/studio)
- A physical Android device for deployment and testing
- 4 GB or more RAM for smaller models
- More RAM is recommended for larger models such as **Gemma 4 E2B** and **Gemma 4 E4B**
- A temporary internet connection for downloading models inside the app
- Real hardware is preferred; emulators are mainly useful for UI checks
---
## 🚀 How to Build & Run
1. Clone this repository.
2. Install the latest **Android Studio**.
3. Open the Android project folder in Android Studio:
```text
pocket_llm_src/
```
4. Build and install the app on your Android device.
5. Launch the app.
6. On first launch, choose a model from the built-in model picker.
7. Download the selected model directly inside the app.
8. Start chatting locally on device
---
## 📄 License Notice
### Gemma 4
Gemma 4 is provided by Google under the **Apache License 2.0**. Google's Gemma documentation also states that Gemma models are provided with open weights and support responsible commercial use.
- Gemma 4 license: https://ai.google.dev/gemma/apache_2
- Gemma 4 overview: https://ai.google.dev/gemma/docs/core
### Qwen models
Qwen model files follow the upstream Qwen license terms.
Please review the original model license before redistribution or commercial use.