{"id":31644501,"url":"https://github.com/aatricks/llmedge-examples","last_synced_at":"2026-04-14T03:31:17.933Z","repository":{"id":316457202,"uuid":"1059526577","full_name":"Aatricks/llmedge-examples","owner":"Aatricks","description":"Examples using the llmedge library","archived":false,"fork":false,"pushed_at":"2025-09-24T17:13:51.000Z","size":118811,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-24T19:18:32.236Z","etag":null,"topics":["android","app-example","gguf","kotlin","llamacpp","llm-inference"],"latest_commit_sha":null,"homepage":"","language":"Kotlin","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Aatricks.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-18T15:04:12.000Z","updated_at":"2025-09-24T17:13:55.000Z","dependencies_parsed_at":"2025-09-24T19:31:50.379Z","dependency_job_id":null,"html_url":"https://github.com/Aatricks/llmedge-examples","commit_stats":null,"previous_names":["aatricks/llmedge-examples"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/Aatricks/llmedge-examples","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Aatricks%2Fllmedge-examples","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Aatricks%2Fllmedge-examples/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Aatricks%2Fllmedge-examples/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Aatricks%2Fllmedge-examples/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Aatricks","download_url":"https://codeload.github.com/Aatricks/llmedge-examples/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Aatricks%2Fllmedge-examples/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278722768,"owners_count":26034461,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-07T02:00:06.786Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["android","app-example","gguf","kotlin","llamacpp","llm-inference"],"created_at":"2025-10-07T04:53:30.869Z","updated_at":"2026-04-14T03:31:17.925Z","avatar_url":"https://github.com/Aatricks.png","language":"Kotlin","funding_links":[],"categories":[],"sub_categories":[],"readme":"# llmedge Examples\n\nComprehensive demonstration applications for the llmedge Android library, showcasing on-device language model inference, RAG pipelines, image generation, and video synthesis capabilities.\n\n**Main Library Repository**: https://github.com/Aatricks/llmedge\n\n## Overview\n\nThis example application provides production-ready demonstrations of llmedge's core features. Each activity is designed to illustrate best practices for model loading, memory management, and efficient on-device inference.\n\n## Included Demonstrations\n\n### Language Model Inference\n\n**Local Asset Demo** (`LocalAssetDemoActivity.kt`)\n- Demonstrates loading GGUF models bundled within the APK\n- Illustrates asset extraction to app-private storage\n- Shows both blocking and streaming inference patterns\n- Suitable for offline-first applications\n\n**Jinja Chat Template Demo** (`JinjaTemplateDemoActivity.kt`)\n- Demonstrates passing an explicit loop-based Jinja chat template through `SmolLM.InferenceParams.chatTemplate`\n- Downloads a GGUF model from Hugging Face through `SmolLM.loadFromHuggingFace(...)`\n- Shows the exact template string used for the request so the override path is visible in-app\n\n**Hugging Face Demo** (`HuggingFaceDemoActivity.kt`)\n- Automated model download from Hugging Face Hub\n- Progress monitoring and cache management\n- Demonstrates proper error handling for network operations\n- Shows model reuse across application sessions\n\n### Retrieval-Augmented Generation\n\n**RAG Demo** (`RagActivity.kt`)\n- Complete on-device RAG pipeline implementation\n- Document indexing with ONNX embeddings\n- Vector similarity search and context retrieval\n- Integration with SmolLM for answer generation\n- Demonstrates PDF parsing and text chunking strategies\n\n### Vision and Multimodal Processing\n\n**Image Text Extraction** (`ImageToTextActivity.kt`)\n- Google ML Kit OCR integration\n- Batch image processing capabilities\n- Error handling for unsupported image formats\n- Demonstrates preprocessing for vision models\n\n**Vision Model Demo** (`LlavaVisionActivity.kt`)\n- Vision-capable language model integration\n- Image-to-text description generation\n- Multimodal input preparation\n- Demonstrates vision model inference patterns\n\n### Generative Media\n\n**Image Generation** (`StableDiffusionActivity.kt`)\n- Text-to-image synthesis using Stable Diffusion\n- LoRA Support: Toggle switch to apply Detail Tweaker LoRA, automatically downloaded from Hugging Face\n- EasyCache: Auto-enabled acceleration for supported DiT models (Flux, SD3, Wan, Qwen Image, Z-Image)\n- Memory-aware configuration options\n- Progressive generation with cancellation support\n- Demonstrates VAE loading and tensor offloading strategies\n\n**Video Generation** (`VideoGenerationActivity.kt`)\n- Text-to-video synthesis using Wan models\n- Multi-file model loading (main + VAE + T5XXL)\n- Device capability detection (12GB+ RAM required)\n- Frame-by-frame progress monitoring\n- Demonstrates proper resource cleanup\n\n### Speech Processing\n\n**Speech-to-Text (STT)** (`STTActivity.kt`)\n- Whisper model download from Hugging Face\n- Audio recording and transcription\n- Real-time streaming transcription support\n- Timestamp and SRT generation\n\n**Text-to-Speech (TTS)** (`TTSActivity.kt`)\n- Bark model download from Hugging Face via `LLMEdge`\n- Text input for speech synthesis\n- Progress tracking during generation\n- Audio playback and WAV file saving\n- ARM-optimized native inference with OpenMP\n\n## System Requirements\n\n### Minimum Requirements\n- Android 11+ (API 30)\n- 3GB RAM for basic LLM inference\n- 500MB free storage for model caching\n- 1GB+ free storage for speech models\n\n### Recommended Configuration\n- Android 11+ (API 30) with GPU backends enabled\n- 8GB RAM for Stable Diffusion\n- 12GB+ RAM for video generation (Wan models)\n- 5GB free storage for video model pipeline\n\n### Speech Model Requirements\n- **Whisper STT**: 75MB-500MB depending on model size (tiny to small)\n- **Bark TTS**: 843MB for f16 models\n\n### Development Environment\n- Android SDK with NDK r27+\n- CMake 3.22+\n- Java 17+\n- Gradle 8.0+ (wrapper included)\n\n## Building the Application\n\n### Standard Build Process\n\nFrom the repository root directory:\n\n1. Build the llmedge library:\n```bash\n./gradlew :llmedge:assembleRelease\n```\n\n2. Build the example application:\n```bash\ncd llmedge-examples\n./gradlew :app:assembleDebug\n```\n\n3. Install to device:\n```bash\n./gradlew :app:installDebug\n```\n\n### GPU-Enabled Build\n\nFor Android GPU builds with OpenCL-first, Vulkan-fallback runtime selection:\n\n```bash\n./gradlew :llmedge:assembleRelease \\\n  -PllmedgeAndroidOpencl=ON \\\n  -Pandroid.jniCmakeArgs=\"-DGGML_VULKAN=ON -DSD_VULKAN=ON\"\n\ncd llmedge-examples\n./gradlew :app:assembleDebug :app:installDebug\n```\n\n**Notes**:\n- Experimental OpenCL support is Android-only and currently limited to `arm64-v8a`.\n- At runtime, `llmedge` prefers OpenCL first, then Vulkan, then CPU for text, Whisper, and image/video.\n- Bark remains CPU-only.\n\n## Asset Configuration\n\n### Bundled GGUF Models\n\nPlace small GGUF models in `app/src/main/assets/` for offline-first demos:\n\n```\napp/src/main/assets/\n              └── models/\n                  └── smolm2-360M-instruct.gguf\n```\n\nRecommended models for bundling:\n- SmolLM2-360M-Instruct (~200MB)\n- Qwen2-0.5B-Instruct (~300MB)\n- TinyLlama-1.1B (~600MB)\n\n### RAG Embeddings\n\nThe RAG demo requires ONNX embedding models:\n\n```\napp/src/main/assets/\n              └── embeddings/\n                  └── all-minilm-l6-v2/\n                      ├── model.onnx\n                      └── tokenizer.json\n```\n\nDownload from: `sentence-transformers/all-MiniLM-L6-v2` on Hugging Face\n\n### Runtime Model Cache\n\nModels downloaded via Hugging Face are cached at:\n```\n\u003capp_private_dir\u003e/files/hf-models/\u003crepo\u003e/\u003crevision\u003e/\u003cfilename\u003e\n```\n\nCache persists across app restarts and is reused automatically.\n\n## Usage Examples\n\n### Basic LLM Inference\n\n```kotlin\nval edge = LLMEdge.create(context, lifecycleScope)\n\nCoroutineScope(Dispatchers.IO).launch {\n    val response = edge.text.generate(\n        prompt = \"Explain quantum computing concisely.\",\n        model = ModelSpec.huggingFace(\n            repoId = \"unsloth/Qwen3-0.6B-GGUF\",\n            filename = \"Qwen3-0.6B-Q4_K_M.gguf\",\n        ),\n    )\n    \n    withContext(Dispatchers.Main) {\n        textView.text = response\n    }\n}\n```\n\n### RAG Pipeline\n\n```kotlin\nval edge = LLMEdge.create(context, lifecycleScope)\nval rag = edge.rag.createSession()\nrag.init()\n\nCoroutineScope(Dispatchers.IO).launch {\n    val chunks = rag.indexPdf(pdfUri)\n    val answer = rag.ask(\"What are the main conclusions?\")\n\n    withContext(Dispatchers.Main) {\n        resultView.text = answer\n    }\n}\n```\n\n### Speech-to-Text (Whisper)\n\n```kotlin\nval edge = LLMEdge.create(context, lifecycleScope)\n\nCoroutineScope(Dispatchers.IO).launch {\n    // Simple transcription\n    val text = edge.speech.transcribeToText(audioSamples)\n\n    // Full transcription with timing\n    val segments = edge.speech.transcribe(\n        audioSamples = audioSamples,\n        params = Whisper.TranscribeParams(language = \"en\"),\n    )\n\n    withContext(Dispatchers.Main) {\n        segments.forEach { segment -\u003e\n            textView.append(\"[${segment.startTimeMs}ms] ${segment.text}\\n\")\n        }\n    }\n}\n```\n\n### Real-time Streaming Transcription\n\nFor live captioning from a microphone:\n\n```kotlin\nclass LiveCaptionActivity : AppCompatActivity() {\n    private var transcriber: StreamingTranscriptionSession? = null\n\n    fun startLiveCaptions() {\n        lifecycleScope.launch(Dispatchers.IO) {\n            // Create streaming transcriber with sliding window\n            transcriber = LLMEdge.create(this@LiveCaptionActivity, lifecycleScope).speech.createStreamingSession(\n                params = Whisper.StreamingParams(\n                    stepMs = 3000,      // Process every 3 seconds\n                    lengthMs = 10000,   // 10-second windows\n                    language = \"en\",\n                    useVad = true       // Skip silent audio\n                )\n            )\n\n            // Collect transcription results\n            transcriber?.events()?.collect { segment -\u003e\n                withContext(Dispatchers.Main) {\n                    captionTextView.text = segment.text\n                }\n            }\n        }\n    }\n\n    // Feed audio from microphone (called by AudioRecord callback)\n    fun onAudioData(samples: FloatArray) {\n        lifecycleScope.launch(Dispatchers.IO) {\n            transcriber?.feedAudio(samples)\n        }\n    }\n\n    fun stopLiveCaptions() {\n        transcriber?.stop()\n    }\n}\n```\n\n### Text-to-Speech (Bark)\n\n\n```kotlin\nval edge = LLMEdge.create(context, lifecycleScope)\n\nCoroutineScope(Dispatchers.IO).launch {\n    // Generate speech (model auto-downloads on first use)\n    val audio = edge.speech.synthesize(\"Hello, world!\")\n    audioPlayer.play(audio.samples, audio.sampleRate)\n}\n```\n\n### Image Generation\n\n```kotlin\nval edge = LLMEdge.create(this, lifecycleScope)\n\nval bitmap = edge.image.generate(\n    ImageGenerationRequest(\n        prompt = \"serene mountain landscape, sunset\",\n        width = 512,\n        height = 512,\n        steps = 20\n    ),\n)\n\nimageView.setImageBitmap(bitmap)\n```\n\n### Video Generation\n\n```kotlin\nval edge = LLMEdge.create(this, lifecycleScope)\n\n// Automatic memory management and sequential loading\nedge.image.generateVideo(\n    VideoGenerationRequest(\n        prompt = \"cat walking through garden\",\n        videoFrames = 8,\n        width = 512,\n        height = 512,\n        steps = 20,\n        cfgScale = 7.0f,\n        flowShift = 3.0f,\n        forceSequentialLoad = true // Safe for most devices\n    )\n).collect { event -\u003e\n    Log.d(\"VideoGen\", event.toString())\n}\n```\n\n## Performance Optimization\n\n### Memory Management\n\n**Monitor Memory Usage**:\n```kotlin\nval snapshot = MemoryMetrics.snapshot(context)\nLog.d(\"Memory\", \"Native heap: ${snapshot.nativePssKb / 1024}MB\")\n```\n\n**Optimization Strategies**:\n- Use quantized models (Q4_K_M) for lower memory footprint\n- Enable CPU offloading for large models\n- Close model instances when not in use\n- Process images/video in batches with intermediate cleanup\n\n### Thread Configuration\n\n```kotlin\nval edge = LLMEdge.create(\n    context = context,\n    scope = lifecycleScope,\n    config = LLMEdgeConfig(\n        text = TextRuntimeConfig(\n            promptThreads = Runtime.getRuntime().availableProcessors(),\n            contextSize = 2048,\n        ),\n    ),\n)\n```\n\n### GPU Backends\n\nVerify Android GPU capability:\n```kotlin\nval textBackends = LLMEdge.getTextBackendAvailability()\nval imageBackends = LLMEdge.getImageBackendAvailability()\n\nLog.i(\"Performance\", \"Text backends: $textBackends\")\nLog.i(\"Performance\", \"Image backends: $imageBackends\")\n```\n\nCheck logcat for initialization:\n```bash\nadb logcat -s SmolLM:* SmolSD:* | grep -Ei \"opencl|vulkan|backend\"\n```\n\n## Troubleshooting\n\n### Model Loading Failures\n\n**Symptoms**: `FileNotFoundException`, `IllegalStateException` during load\n\n**Solutions**:\n- Verify model file exists in expected location\n- Check available storage space\n- Ensure network connectivity for Hugging Face downloads\n- Validate model file integrity (not corrupted)\n\n### Out of Memory Errors\n\n**Symptoms**: App crashes with OOM during inference or generation\n\n**Solutions**:\n- Use smaller models or quantized variants\n- Reduce image/video resolution\n- Enable CPU offloading: `offloadToCpu = true`\n- Lower context window size\n- Close unused model instances\n\n### Slow Inference Performance\n\n**Symptoms**: Generation takes excessive time per token/frame\n\n**Solutions**:\n- Use quantized models (Q4_K_M, Q3_K_S)\n- Reduce inference steps (15-20 is usually sufficient)\n- Enable Android GPU backends on compatible devices\n- Adjust thread count to match device cores\n- Use smaller resolutions for media generation\n\n### Video Generation Failures\n\n**Symptoms**: Crashes or errors when loading Wan models\n\n**Solutions**:\n- Verify device has 12GB+ RAM\n- Ensure all three files downloaded (main + VAE + T5XXL)\n- Use explicit file paths (not modelId shorthand)\n- Check stable-diffusion.cpp logs in logcat\n- Verify sufficient storage for 6GB+ model files\n\n### Native Library Issues\n\n**Symptoms**: `UnsatisfiedLinkError`, native crashes\n\n**Solutions**:\n- Rebuild AAR and reinstall app\n- Verify NDK version matches (r27+)\n- Check device ABI compatibility\n- Inspect logcat for native stack traces\n- Clean build: `./gradlew clean`\n\n### Speech Processing Issues\n\n**Symptoms**: Whisper transcription crashing or producing garbled output\n\n**Solutions**:\n- Ensure audio is 16kHz mono PCM float32 format\n- Use smaller models (tiny/base) for faster processing\n- Check that model file downloaded completely\n\n## Testing Infrastructure\n\n### Speech E2E Testing\n\nRun speech tests via adb:\n```bash\nadb shell am instrument -w -e class com.example.llmedgeexample.SpeechE2ETest \\\n  com.example.llmedgeexample.test/androidx.test.runner.AndroidJUnitRunner\n```\n\n### Headless E2E Testing\n\nRun automated video generation tests:\n\n```bash\nadb shell am start -n com.example.llmedgeexample/.HeadlessVideoTestActivity\n```\n\nMonitor test execution:\n```bash\nadb logcat -s VideoE2E:*\n```\n\nTest results are logged to logcat with detailed timing and validation metrics.\n\n## Architecture Notes\n\n### Memory Architecture\n- Native models allocated via JNI in native heap\n- Dalvik heap used only for Java objects and bitmaps\n- Large file downloads use system DownloadManager\n- Tensor operations execute in native memory space\n\n### Threading Model\n- All model operations run on background threads (Dispatchers.IO)\n- UI updates dispatched to Main thread\n- Blocking calls avoided on UI thread\n- Coroutines used for structured concurrency\n\n### Resource Lifecycle\n- Models implement `AutoCloseable` for automatic cleanup\n- Native resources freed via `close()` method\n- File handles managed with try-with-resources pattern\n- Memory mapped files used for large model loading\n\n## License\n\nApache 2.0 - See LICENSE file for details\n\n## Contributing\n\nContributions are welcome. Please review the main repository's contributing guidelines before submitting pull requests.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faatricks%2Fllmedge-examples","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faatricks%2Fllmedge-examples","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faatricks%2Fllmedge-examples/lists"}