https://github.com/cactus-compute/cactus
Framework for AI on mobile devices and wearables, hardware-aware C/CPP backend, with wrappers for Kotlin, Java, Swift, React, Flutter.
https://github.com/cactus-compute/cactus
android dart flutter framework ios java javascript kotlin library llamacpp llm llm-inference llms objective-c react-native swift transformer transformers typescript
Last synced: about 1 month ago
JSON representation
Framework for AI on mobile devices and wearables, hardware-aware C/CPP backend, with wrappers for Kotlin, Java, Swift, React, Flutter.
- Host: GitHub
- URL: https://github.com/cactus-compute/cactus
- Owner: cactus-compute
- License: mit
- Created: 2025-04-23T14:33:43.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2025-05-06T15:03:57.000Z (about 1 month ago)
- Last Synced: 2025-05-07T02:07:17.878Z (about 1 month ago)
- Topics: android, dart, flutter, framework, ios, java, javascript, kotlin, library, llamacpp, llm, llm-inference, llms, objective-c, react-native, swift, transformer, transformers, typescript
- Language: C++
- Homepage: https://cactuscompute.com
- Size: 146 MB
- Stars: 131
- Watchers: 4
- Forks: 24
- Open Issues: 15
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
- awesome - cactus-compute/cactus - Framework for running AI locally on mobile devices and wearables. Hardware-aware C/C++ backend with wrappers for Flutter & React Native. Kotlin & Swift coming soon. (C++)
README

[![Email][gmail-shield]][gmail-url]
[![Discord][discord-shield]][discord-url]
[![Design Docs][docs-shield]][docs-url]

[![Stars][stars-shield]][github-url]
[![Forks][forks-shield]][github-url][gmail-shield]: https://img.shields.io/badge/Gmail-red?style=for-the-badge&logo=gmail&logoColor=white
[gmail-url]: [email protected][linkedin-shield]: https://img.shields.io/badge/-LinkedIn-blue.svg?style=for-the-badge&logo=linkedin&colorB=blue
[linkedin-url]: https://www.linkedin.com/company/106281696[discord-shield]: https://img.shields.io/badge/Discord-5865F2?style=for-the-badge&logo=discord&logoColor=white
[discord-url]: https://discord.gg/cBT6jcCF[docs-shield]: https://img.shields.io/badge/Design_Docs-009485?style=for-the-badge&logo=readthedocs&logoColor=white
[docs-url]: https://deepwiki.com/cactus-compute/cactus[website-shield]: https://img.shields.io/badge/Website-black?style=for-the-badge&logo=safari&logoColor=white
[website-url]: https://cactuscompute.com[stars-shield]: https://img.shields.io/github/stars/cactus-compute/cactus?style=for-the-badge&color=yellow
[forks-shield]: https://img.shields.io/github/forks/cactus-compute/cactus?style=for-the-badge&color=blue
[issues-shield]: https://img.shields.io/github/issues/cactus-compute/cactus?style=for-the-badge
[prs-shield]: https://img.shields.io/github/issues-pr/cactus-compute/cactus?style=for-the-badge
[github-url]: https://github.com/cactus-compute/cactusCactus is a lightweight, high-performance framework for running AI models on mobile phones. Cactus has unified and consistent APIs across
- React-Native
- Android/Kotlin
- Android/Java
- iOS/Swift
- iOS/Objective-C++
- Flutter/DartCactus currently leverages GGML backends to support any GGUF model already compatible with [](https://github.com/ggerganov/llama.cpp), while we focus on broadly supporting every moblie app development platform, as well as upcoming features like:
- MCP
- phone tool use
- thinking
- prompt-enhancement
- higher-level APIsContributors with any of the above experiences are welcome! Feel free to submit cool example apps you built with Cactus, issues or tests!
Cactus Models coming soon.
## Table of Contents
- [Technical Architecture](#technical-architecture)
- [Features](#features)
- [Benchmarks](#benchmarks)
- [Getting Started](#getting-started)
- [React Native](#react-native-shipped)
- [Android](#android-currently-testing)
- [Swift](#ios-in-developement)
- [Flutter](#flutter-in-development)
- [C++ (Raw backend)](#c-raw-backend)
- [License](#license)## Technical Architecture
```
┌─────────────────────────────────────────────────────────┐
│ Applications │
└───────────────┬─────────────────┬───────────────────────┘
│ │
┌───────────────┼─────────────────┼───────────────────────-┐
│ ┌─────────────▼─────┐ ┌─────────▼───────┐ ┌─────────────┐|
│ │ React API │ │ Flutter API │ │ Native APIs│|
│ └───────────────────┘ └─────────────────┘ └─────────────┘|
│ Platform Bindings │
└───────────────┬─────────────────┬───────────────────────-┘
│ │
┌───────────────▼─────────────────▼───────────────────────┐
│ Cactus Core (C++) │
└───────────────┬─────────────────┬───────────────────────┘
│ │
┌───────────────▼─────┐ ┌─────────▼───────────────────────┐
│ Llama.cpp Core │ │ GGML/GGUF Model Format │
└─────────────────────┘ └─────────────────────────────────┘
```
- **Features**:
- Model download from HuggingFace
- Text completion and chat completion
- Streaming token generation
- Embedding generation
- JSON mode with schema validation
- Chat templates with Jinja2 support
- Low memory footprint
- Battery-efficient inference
- Background processing## Benchmarks
we created a little chat app for demo, you can try other models and report your finding here, [download the app](https://lnkd.in/dYGR54hn)
Gemma 1B INT8:
- iPhone 16 Pro Max: ~45 toks/sec
- iPhone 13 Pro: ~30 toks/sec
- Galaxy A14: ~6 toks/sec
- Galaxy S24 plus: ~20 toks/sec
- Galaxy S21: ~14 toks/sec
- Google Pixel 6a: ~14 toks/secSmollLM 135m INT8:
- iPhone 13 Pro: ~180 toks/sec
- Galaxy A14: ~30 toks/sec
- Galaxy S21: ~42 toks/sec
- Google Pixel 6a: ~38 toks/sec
- Huawei P60 Lite (Gran's phone) ~8toks/sec## Getting Started
### ✅ React Native (TypeScript/JavaScript)
```bash
npm install cactus-react-native
# or
yarn add cactus-react-native# For iOS, install pods if not on Expo
npx pod-install
```
```typescript
import { initLlama, LlamaContext } from 'cactus-react-native';// Load model
const context = await initLlama({
model: 'models/llama-2-7b-chat.gguf', // Path to your model
n_ctx: 2048,
n_batch: 512,
n_threads: 4
});// Generate completion
const result = await context.completion({
prompt: 'Explain quantum computing in simple terms',
temperature: 0.7,
top_k: 40,
top_p: 0.95,
n_predict: 512
}, (token) => {
// Process each token
process.stdout.write(token.token);
});// Clean up
await context.release();
```For more detailed documentation and examples, see the [React Native README](react/README.md).
### ✅ Android (Kotlin/Java)
**1. Add Repository to `settings.gradle.kts`:**
```kotlin
// settings.gradle.kts
dependencyResolutionManagement {
repositoriesMode.set(RepositoriesMode.FAIL_ON_PROJECT_REPOS) // Optional but recommended
repositories {
google()
mavenCentral()
// Add GitHub Packages repository for Cactus
maven {
name = "GitHubPackagesCactusCompute"
url = uri("https://maven.pkg.github.com/cactus-compute/cactus")
}
}
}
```**2. Add Dependency to Module's `build.gradle.kts`:**
```kotlin
// app/build.gradle.kts
dependencies {
implementation("io.github.cactus-compute:cactus-android:0.0.1")
}
```**3. Basic Usage (Kotlin):**
```kotlin
import com.cactus.android.LlamaContext
import com.cactus.android.LlamaInitParams
import com.cactus.android.LlamaCompletionParams
import kotlinx.coroutines.Dispatchers
import kotlinx.coroutines.withContext// In an Activity, ViewModel, or coroutine scope
suspend fun runInference() {
var llamaContext: LlamaContext? = null
try {
// Initialize (off main thread)
llamaContext = withContext(Dispatchers.IO) {
LlamaContext.create(
params = LlamaInitParams(
modelPath = "path/to/your/model.gguf",
nCtx = 2048, nThreads = 4
)
)
}// Complete (off main thread)
val result = withContext(Dispatchers.IO) {
llamaContext?.complete(
prompt = "Explain quantum computing in simple terms",
params = LlamaCompletionParams(temperature = 0.7f, nPredict = 512)
) { partialResultMap ->
val token = partialResultMap["token"] as? String ?: ""
print(token) // Process stream on background thread
true // Continue generation
}
}
println("\nFinal text: ${result?.text}")} catch (e: Exception) {
// Handle errors
println("Error: ${e.message}")
} finally {
// Clean up (off main thread)
withContext(Dispatchers.IO) {
llamaContext?.close()
}
}
}
```For more detailed documentation and examples, see the [Android README](android/README.md).
### 🚧 Swift (in developement)
### 🚧 Flutter (in developement)
```