Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/xuegao-tzx/fllama

A flutter binding for llama.cpp, which use platform channel.
https://github.com/xuegao-tzx/fllama

ai android dart flutter harmonyos ios llamacpp

Last synced: 20 days ago
JSON representation

A flutter binding for llama.cpp, which use platform channel.

Host: GitHub
URL: https://github.com/xuegao-tzx/fllama
Owner: xuegao-tzx
License: mit
Created: 2024-10-21T03:28:44.000Z (3 months ago)
Default Branch: main
Last Pushed: 2024-12-02T07:57:27.000Z (about 1 month ago)
Last Synced: 2024-12-17T15:51:05.581Z (25 days ago)
Topics: ai, android, dart, flutter, harmonyos, ios, llamacpp
Language: C++
Homepage: https://pub.dev/packages/fcllama
Size: 22.8 MB
Stars: 14
Watchers: 4
Forks: 0
Open Issues: 2
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE

Awesome Lists containing this project

README

        # FCllama

[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT)

[![pub package](https://img.shields.io/pub/v/fcllama.svg?label=fcllama&color=blue)](https://pub.dev/packages/fcllama)

Flutter binding of [llama.cpp](https://github.com/ggerganov/llama.cpp) , which use platform channel .

[llama.cpp](https://github.com/ggerganov/llama.cpp): Inference of [LLaMA](https://arxiv.org/abs/2302.13971) model in pure C/C++

## Installation

### Flutter

```sh

flutter pub add fcllama

```

#### iOS

Please run `pod install` or `pod update` in your iOS project.

#### Android

You need install cmake 3.31.0、android sdk 35 and ndk 28.0.12674087.

No additional operation required .

### OpenHarmonyOS/HarmonyOS

This is the fastest and recommended way to add HLlama to your project.

```bash

ohpm install hllama

```

Or, you can add it to your project manually.

* Add the following lines to `oh-package.json5` on your app module.

```json

"dependencies": {

  "hllama": "^0.0.2",

}

```

* Then run

```bash

  ohpm install

```

## How to use

### Flutter

1. Initializing Llama

```dart

import 'package:fcllama/fllama.dart';

FCllama.instance()?.initContext("model path",emitLoadProgress: true)

        .then((context) {

  modelContextId = context?["contextId"].toString() ?? "";

  if (modelContextId.isNotEmpty) {

    // you can get modelContextId，if modelContextId > 0 is success.

  }

});

```

2. Bench model on device

```dart

import 'package:fcllama/fllama.dart';

FCllama.instance()?.bench(double.parse(modelContextId),pp:8,tg:4,pl:2,nr: 1).then((res){

  Get.log("[FCllama] Bench Res $res");

});

```

3. Tokenize and Detokenize

```dart

import 'package:fcllama/fllama.dart';

FCllama.instance()?.tokenize(double.parse(modelContextId), text: "What can you do?").then((res){

  Get.log("[FCllama] Tokenize Res $res");

  FCllama.instance()?.detokenize(double.parse(modelContextId), tokens: res?['tokens']).then((res){

    Get.log("[FCllama] Detokenize Res $res");

  });

});

```

4. Streaming monitoring

```dart

import 'package:fcllama/fllama.dart';

FCllama.instance()?.onTokenStream?.listen((data) {

  if(data['function']=="loadProgress"){

    Get.log("[FCllama] loadProgress=${data['result']}");

  }else if(data['function']=="completion"){

    Get.log("[FCllama] completion=${data['result']}");

    final tempRes = data["result"]["token"];

    // tempRes is ans

  }

});

```

5. Release this or Stop one

```dart

import 'package:fcllama/fllama.dart';

FCllama.instance()?.stopCompletion(contextId: double.parse(modelContextId)); // stop one completion

FCllama.instance()?.releaseContext(double.parse(modelContextId)); // release one

FCllama.instance()?.releaseAllContexts(); // release all

```

### OpenHarmonyOS/HarmonyOS

You can see [this file](./example/harmony/hllama/README.md)

## Support System

| System | Min SDK | Arch | Other |

|:--:|:--:|:--:|:--:|

| Android | 23 | arm64-v8a、x86_64、armeabi-v7a | Supports additional optimizations for certain CPUs |

| iOS | 14 | arm64 | Support Metal |

| OpenHarmonyOS/HarmonyOS | 12 | arm64-v8a、x86_64 | No additional optimizations for certain CPUs are supported |

## Obtain the model

You can search HuggingFace for available models (Keyword: [`GGUF`](https://huggingface.co/search/full-text?q=GGUF&type=model)).

For get a GGUF model or quantize manually, see [`Prepare and Quantize`](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#prepare-and-quantize) section in llama.cpp.

## NOTE

iOS:

- The [Extended Virtual Addressing](https://developer.apple.com/documentation/bundleresources/entitlements/com_apple_developer_kernel_extended-virtual-addressing) capability is recommended to enable on iOS project.

- Metal:

    - We have tested to know some devices is not able to use Metal ('params.n_gpu_layers > 0') due to llama.cpp used SIMD-scoped operation, you can check if your device is supported in [Metal feature set tables](https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf), Apple7 GPU will be the minimum requirement.

    - It's also not supported in iOS simulator due to [this limitation](https://developer.apple.com/documentation/metal/developing_metal_apps_that_run_in_simulator#3241609), we used constant buffers more than 14.

Android:

- Currently only supported arm64-v8a / x86_64 / armeabi-v7a platform, this means you can't initialize a context on another platforms. The 64-bit platform are recommended because it can allocate more memory for the model.

- No integrated any GPU backend yet.

## License

MIT