https://github.com/cactus-compute/cactus

Framework for AI on mobile devices and wearables, hardware-aware C/CPP backend, with wrappers for Kotlin, Java, Swift, React, Flutter.
https://github.com/cactus-compute/cactus

android dart flutter framework ios java javascript kotlin library llamacpp llm llm-inference llms objective-c react-native swift transformer transformers typescript

Last synced: about 1 month ago
JSON representation

Framework for AI on mobile devices and wearables, hardware-aware C/CPP backend, with wrappers for Kotlin, Java, Swift, React, Flutter.

Host: GitHub
URL: https://github.com/cactus-compute/cactus
Owner: cactus-compute
License: mit
Created: 2025-04-23T14:33:43.000Z (about 2 months ago)
Default Branch: main
Last Pushed: 2025-05-06T15:03:57.000Z (about 1 month ago)
Last Synced: 2025-05-07T02:07:17.878Z (about 1 month ago)
Topics: android, dart, flutter, framework, ios, java, javascript, kotlin, library, llamacpp, llm, llm-inference, llms, objective-c, react-native, swift, transformer, transformers, typescript
Language: C++
Homepage: https://cactuscompute.com
Size: 146 MB
Stars: 131
Watchers: 4
Forks: 24
Open Issues: 15
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE

Awesome Lists containing this project

awesome - cactus-compute/cactus - Framework for running AI locally on mobile devices and wearables. Hardware-aware C/C++ backend with wrappers for Flutter & React Native. Kotlin & Swift coming soon. (C++)

README

        ![Logo](assets/banner.jpg)

[![Email][gmail-shield]][gmail-url]

[![Discord][discord-shield]][discord-url]

[![Design Docs][docs-shield]][docs-url]

![License](https://img.shields.io/github/license/cactus-compute/cactus?style=for-the-badge)

[![Stars][stars-shield]][github-url]

[![Forks][forks-shield]][github-url]

[gmail-shield]: https://img.shields.io/badge/Gmail-red?style=for-the-badge&logo=gmail&logoColor=white

[gmail-url]: [email protected]

[linkedin-shield]: https://img.shields.io/badge/-LinkedIn-blue.svg?style=for-the-badge&logo=linkedin&colorB=blue

[linkedin-url]: https://www.linkedin.com/company/106281696

[discord-shield]: https://img.shields.io/badge/Discord-5865F2?style=for-the-badge&logo=discord&logoColor=white

[discord-url]: https://discord.gg/cBT6jcCF

[docs-shield]: https://img.shields.io/badge/Design_Docs-009485?style=for-the-badge&logo=readthedocs&logoColor=white

[docs-url]: https://deepwiki.com/cactus-compute/cactus

[website-shield]: https://img.shields.io/badge/Website-black?style=for-the-badge&logo=safari&logoColor=white

[website-url]: https://cactuscompute.com

[stars-shield]: https://img.shields.io/github/stars/cactus-compute/cactus?style=for-the-badge&color=yellow

[forks-shield]: https://img.shields.io/github/forks/cactus-compute/cactus?style=for-the-badge&color=blue

[issues-shield]: https://img.shields.io/github/issues/cactus-compute/cactus?style=for-the-badge

[prs-shield]: https://img.shields.io/github/issues-pr/cactus-compute/cactus?style=for-the-badge

[github-url]: https://github.com/cactus-compute/cactus

Cactus is a lightweight, high-performance framework for running AI models on mobile phones. Cactus has unified and consistent APIs across 

- React-Native

- Android/Kotlin

- Android/Java

- iOS/Swift

- iOS/Objective-C++

- Flutter/Dart

Cactus currently leverages GGML backends to support any GGUF model already compatible with [![Llama.cpp](https://img.shields.io/badge/Llama.cpp-000000?style=flat&logo=github&logoColor=white)](https://github.com/ggerganov/llama.cpp), while we focus on broadly supporting every moblie app development platform, as well as upcoming features like:

- MCP

- phone tool use

- thinking

- prompt-enhancement

- higher-level APIs

Contributors with any of the above experiences are welcome! Feel free to submit cool example apps you built with Cactus, issues or tests! 

Cactus Models coming soon.

## Table of Contents

- [Technical Architecture](#technical-architecture)

- [Features](#features)

- [Benchmarks](#benchmarks)

- [Getting Started](#getting-started)

  - [React Native](#react-native-shipped)

  - [Android](#android-currently-testing)

  - [Swift](#ios-in-developement)

  - [Flutter](#flutter-in-development)

  - [C++ (Raw backend)](#c-raw-backend)

- [License](#license)

## Technical Architecture

```

┌─────────────────────────────────────────────────────────┐

│                     Applications                        │

└───────────────┬─────────────────┬───────────────────────┘

                │                 │                

┌───────────────┼─────────────────┼───────────────────────-┐

│ ┌─────────────▼─────┐ ┌─────────▼───────┐ ┌─────────────┐|

│ │     React API     │ │   Flutter API   │ │  Native APIs│|

│ └───────────────────┘ └─────────────────┘ └─────────────┘|

│                Platform Bindings                         │

└───────────────┬─────────────────┬───────────────────────-┘

                │                 │                

┌───────────────▼─────────────────▼───────────────────────┐

│                   Cactus Core (C++)                     │

└───────────────┬─────────────────┬───────────────────────┘

                │                 │                

┌───────────────▼─────┐ ┌─────────▼───────────────────────┐

│   Llama.cpp Core    │ │    GGML/GGUF Model Format       │

└─────────────────────┘ └─────────────────────────────────┘

```

- **Features**:

  - Model download from HuggingFace 

  - Text completion and chat completion

  - Streaming token generation 

  - Embedding generation

  - JSON mode with schema validation

  - Chat templates with Jinja2 support

  - Low memory footprint

  - Battery-efficient inference

  - Background processing

## Benchmarks 

we created a little chat app for demo, you can try other models and report your finding here, [download the app](https://lnkd.in/dYGR54hn)

Gemma 1B INT8:

- iPhone 16 Pro Max: ~45 toks/sec

- iPhone 13 Pro: ~30 toks/sec

- Galaxy A14: ~6 toks/sec

- Galaxy S24 plus: ~20 toks/sec 

- Galaxy S21: ~14 toks/sec 

- Google Pixel 6a: ~14 toks/sec 

SmollLM 135m INT8: 

- iPhone 13 Pro: ~180 toks/sec

- Galaxy A14: ~30 toks/sec

- Galaxy S21: ~42 toks/sec

- Google Pixel 6a: ~38 toks/sec

- Huawei P60 Lite (Gran's phone) ~8toks/sec

## Getting Started

### ✅ React Native (TypeScript/JavaScript)

```bash

npm install cactus-react-native

# or

yarn add cactus-react-native

# For iOS, install pods if not on Expo

npx pod-install

```

```typescript

import { initLlama, LlamaContext } from 'cactus-react-native';

// Load model

const context = await initLlama({

  model: 'models/llama-2-7b-chat.gguf', // Path to your model

  n_ctx: 2048,

  n_batch: 512,

  n_threads: 4

});

// Generate completion

const result = await context.completion({

  prompt: 'Explain quantum computing in simple terms',

  temperature: 0.7,

  top_k: 40,

  top_p: 0.95,

  n_predict: 512

}, (token) => {

  // Process each token

  process.stdout.write(token.token);

});

// Clean up

await context.release();

```

For more detailed documentation and examples, see the [React Native README](react/README.md).

### ✅ Android (Kotlin/Java)

**1. Add Repository to `settings.gradle.kts`:**

```kotlin

// settings.gradle.kts

dependencyResolutionManagement {

    repositoriesMode.set(RepositoriesMode.FAIL_ON_PROJECT_REPOS) // Optional but recommended

    repositories {

        google()

        mavenCentral()

        // Add GitHub Packages repository for Cactus

        maven {

            name = "GitHubPackagesCactusCompute"

            url = uri("https://maven.pkg.github.com/cactus-compute/cactus")

        }

    }

}

```

**2. Add Dependency to Module's `build.gradle.kts`:**

```kotlin

// app/build.gradle.kts

dependencies {

    implementation("io.github.cactus-compute:cactus-android:0.0.1")

}

```

**3. Basic Usage (Kotlin):**

```kotlin

import com.cactus.android.LlamaContext

import com.cactus.android.LlamaInitParams

import com.cactus.android.LlamaCompletionParams

import kotlinx.coroutines.Dispatchers

import kotlinx.coroutines.withContext

// In an Activity, ViewModel, or coroutine scope

suspend fun runInference() {

    var llamaContext: LlamaContext? = null

    try {

        // Initialize (off main thread)

        llamaContext = withContext(Dispatchers.IO) {

            LlamaContext.create(

                params = LlamaInitParams(

                    modelPath = "path/to/your/model.gguf",

                    nCtx = 2048, nThreads = 4

                )

            )

        }

        // Complete (off main thread)

        val result = withContext(Dispatchers.IO) {

            llamaContext?.complete(

                prompt = "Explain quantum computing in simple terms",

                params = LlamaCompletionParams(temperature = 0.7f, nPredict = 512)

            ) { partialResultMap ->

                val token = partialResultMap["token"] as? String ?: ""

                print(token) // Process stream on background thread

                true // Continue generation

            }

        }

        println("\nFinal text: ${result?.text}")

    } catch (e: Exception) {

        // Handle errors

        println("Error: ${e.message}")

    } finally {

        // Clean up (off main thread)

        withContext(Dispatchers.IO) {

             llamaContext?.close()

        }

    }

}

```

For more detailed documentation and examples, see the [Android README](android/README.md).

### 🚧 Swift (in developement)

### 🚧 Flutter (in developement)

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cactus-compute/cactus

Awesome Lists containing this project

README