Projects in Awesome Lists tagged with llama-cpp
A curated list of projects in awesome lists tagged with llama-cpp .
https://github.com/getumbrel/llama-gpt
A self-hosted, offline, ChatGPT-like chatbot. Powered by Llama 2. 100% private, with no data leaving your device. New: Code Llama support!
ai chatgpt code-llama codellama gpt gpt-4 gpt4all llama llama-2 llama-cpp llama2 llamacpp llm localai openai self-hosted
Last synced: 13 May 2025
https://github.com/scisharp/llamasharp
A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.
chatbot gpt llama llama-cpp llama2 llama3 llamacpp llava llm multi-modal semantic-kernel
Last synced: 14 May 2025
https://github.com/SciSharp/LLamaSharp
A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.
chatbot gpt llama llama-cpp llama2 llama3 llamacpp llava llm multi-modal semantic-kernel
Last synced: 24 Mar 2025
https://github.com/off-grid-ai/off-grid-ai-mobile
The Swiss Army Knife of Offline AI. Chat, Speak, and Generate Images - Privacy First, Zero Internet. Download an LLM and use it on your mobile device. No data ever leaves your phone. Supports text-to-text, vision, text-to-image
edge-ai gguf llama-cpp local-ai mobile-ai offline-ai offline-llm ondevice ondevice-ai privacy-first stable-diffusion-android tool-calling whisper-android
Last synced: 30 Jun 2026
https://github.com/Luce-Org/lucebox-hub
Lucebox: LLM inference server built for speed for specific consumer hardware.
cuda cuda-kernels dflash kernel llama-cpp local-ai luce lucebox megakernel nvidia-cuda pflash qwen rtx3090 speculative-decoding speculative-prefill
Last synced: 23 May 2026
https://github.com/mobile-artificial-intelligence/maid
Maid is a cross-platform Flutter app for interfacing with GGUF / llama.cpp models locally, and with Ollama and OpenAI models remotely.
android android-ai chatbot chatgpt facebook flutter free-chatgpt gguf large-language-models llama llama-cpp llama2 llamacpp local-ai mistral mobile-ai mobile-artificial-intelligence ollama openai openorca
Last synced: 11 Apr 2025
https://github.com/withcatai/node-llama-cpp
Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema on the model output on the generation level
ai bindings catai cmake cmake-js cuda embedding function-calling gguf gpu grammar json-schema llama llama-cpp llm metal nodejs prebuilt-binaries self-hosted vulkan
Last synced: 26 Jan 2026
https://github.com/Light-Heart-Labs/DreamServer
Turn your PC, Mac, or Linux box into a private AI server. LLM inference, chat UI, voice, agents, workflows, RAG, and image generation.
ai-agents amd comfyui docker llama-cpp llm local-ai n8n nvidia open-webui rag self-hosted speech-to-text strix-halo text-to-speech workflow-automation
Last synced: 02 Jun 2026
https://github.com/Mobile-Artificial-Intelligence/maid
Maid is a cross-platform Flutter app for interfacing with GGUF / llama.cpp models locally, and with Ollama and OpenAI models remotely.
android android-ai chatbot chatgpt facebook flutter free-chatgpt gguf large-language-models llama llama-cpp llama2 llamacpp local-ai mistral mobile-ai mobile-artificial-intelligence ollama openai openorca
Last synced: 24 Mar 2025
https://github.com/RunanywhereAI/RCLI
Talk to your Mac, query your docs, no cloud required. On-device voice AI + RAG
ai-assistant apple-silicon kitten-tts kokoro-tts lfm2 llama-cpp llm local-ai metal on-device-ai parakeet qwen3 rag speech-to-text text-to-speech tool-calling voice-assistant
Last synced: 25 Apr 2026
https://github.com/undreamai/LLMUnity
Create characters in Unity with LLMs!
ai character chat chatbot conversational-ai dialogue game-development gamedev generative-ai llama llama-cpp llm npc rag unity unity2d unity3d
Last synced: 07 May 2025
https://github.com/mybigday/llama.rn
React Native binding of llama.cpp
android ios llama llama-cpp llm react-native
Last synced: 13 Apr 2026
https://github.com/withcatai/catai
Run AI ✨ assistant locally! with simple API for Node.js 🚀
ai ai-assistant catai chatbot chatgpt chatui dalai ggmlv3 gguf llama-cpp llm local-llm localai node-llama-cpp nodejs openai vicuna vicuna-installation-guide wizardlm
Last synced: 07 Oct 2025
https://github.com/the-crypt-keeper/can-ai-code
Self-evaluating interview for AI coders
ai ggml humaneval langchain llama-cpp llm transformers
Last synced: 05 Apr 2025
https://github.com/mdrokz/rust-llama.cpp
LLama.cpp rust bindings
api-bindings cpp crates-io ffi llama llama-cpp machine-learning model rust
Last synced: 16 May 2025
https://github.com/jlonge4/local_llama
This repo is to showcase how you can run a model locally and offline, free of OpenAI dependencies.
artificial-intelligence langchain llama-cpp llamaindex machinelearning offline python
Last synced: 03 Apr 2025
https://github.com/ptsochantaris/emeltal
Local ML voice chat using high-end models.
ai llama-cpp machine-learning macos ml natural-language-processing speech-recognition swift swiftui user-interface whisper-cpp
Last synced: 05 Apr 2025
https://github.com/phronmophobic/llama.clj
Run LLMs locally. A clojure wrapper for llama.cpp.
Last synced: 09 Apr 2025
https://github.com/gpustack/gguf-parser-go
Review/Check GGUF files and estimate the memory usage and maximum tokens per second.
gguf go llama-box llama-cpp stable-diffusion-cpp
Last synced: 19 Apr 2025
https://github.com/BrutalCoding/shady.ai
Making offline AI models accessible to all types of edge devices.
android cross-platform dart fastlane flutter gguf ios linux linux-desktop llama-cpp llama-dart llvm macos material-design rwkv serverpod shady-ai web whisper-cpp windows
Last synced: 11 Apr 2025
https://github.com/brutalcoding/shady.ai
Making offline AI models accessible to all types of edge devices.
android cross-platform dart fastlane flutter gguf ios linux linux-desktop llama-cpp llama-dart llvm macos material-design rwkv serverpod shady-ai web whisper-cpp windows
Last synced: 09 Apr 2025
https://github.com/nuance1979/llama-server
LLaMA Server combines the power of LLaMA C++ with the beauty of Chatbot UI.
chatbot-ui llama llama-cpp llamacpp
Last synced: 12 May 2025
https://github.com/nrl-ai/customchar
Your customized AI assistant - Personal assistants on any hardware! With llama.cpp, whisper.cpp, ggml, LLaMA-v2.
cpp ggml llama llama-cpp llama-v2 llm stt tts whisper-cpp
Last synced: 25 Aug 2025
https://github.com/ferranpons/llamatik
True on-device AI for Kotlin Multiplatform (Android, iOS, Desktop, JVM, WASM). LLM, Speech-to-Text and Image Generation — powered by llama.cpp, whisper.cpp and stable-diffusion.cpp.
ai android desktop edge-ai ggml inference ios kmp kmp-library kotlin ktor llama llama-cpp llm mobile-ai multiplatform offline-ai on-device-ai privacy rag
Last synced: 17 Apr 2026
https://github.com/vtuber-plan/langport
Langport is a language model inference service
api chatgpt chatgpt-api fauxpilot langchain language-model llama llama-cpp llm openai tabby
Last synced: 30 Jun 2025
https://github.com/r3gm/insightsolver-colab
InsightSolver: Colab notebooks for exploring and solving operational issues using deep learning, machine learning, and related models.
ai-ops aiops autogpt colab-notebook colorization computer-vision deep-learning llama-2 llama-cpp llm machine-learning object-detection stable-diffusion text-to-speech
Last synced: 12 Oct 2025
https://github.com/abhi5h3k/privatedocbot
📚 Local PDF-Integrated Chat Bot: Secure Conversations and Document Assistance with LLM-Powered Privacy
ai chatgpt generative gpt gpt-4 gpt4all langchain llama llama-2 llama-cpp llama2 llamacpp llm localai openai pdf private privategpt self-hosted vectorstore
Last synced: 05 Oct 2025
https://github.com/Abhi5h3k/PrivateDocBot
📚 Local PDF-Integrated Chat Bot: Secure Conversations and Document Assistance with LLM-Powered Privacy
ai chatgpt generative gpt gpt-4 gpt4all langchain llama llama-2 llama-cpp llama2 llamacpp llm localai openai pdf private privategpt self-hosted vectorstore
Last synced: 07 Apr 2025
https://github.com/zhouwg/kantv
workbench for learing&practising AI tech in real scenario on Android device, powered by GGML(Georgi Gerganov Machine Learning) and NCNN(Nihui Convolutional Neural Network) and FFmpeg + OpenCV
edge-ai ffmpeg ffmpeg-android livetv llama-cpp ncnn-android whisper-cpp
Last synced: 04 Apr 2025
https://github.com/fboulnois/llama-cpp-docker
Run llama.cpp in a GPU accelerated Docker container
chatgpt docker docker-compose llama llama-cpp llama2 llama3 llm mistral
Last synced: 07 Mar 2026
https://github.com/lifevalue/healthwallet.me
Open-source, patient-controlled health record app with on-device AI. Aggregates medical data from 52K+ providers via FHIR R4. Offline-first. Flutter.
ai dart digital-health ehr emr fhir flutter health healthcare llama-cpp llm medical-records mobile-health offline-first on-device-ai open-source patient-data personal-health-record privacy self-hosted
Last synced: 24 Apr 2026
https://github.com/mohitsoni48/turbollm
Run any local LLM engine, auto-tuned to your GPU — polished web UI + OpenAI/Anthropic-compatible API. Point Claude Code at your own machine in one command. No Electron, no Python, offline-first.
ai anthropic-api claude-code gguf gpu inference llama-cpp llama-server llm local-llm offline openai-api self-hosted
Last synced: 29 Jun 2026
https://github.com/aj-archipelago/cortex
Simplify and accelerate AI-powered application development with structured interfaces to models and powerful prompt execution environments.
ai chatgpt gpt-3 gpt-35-turbo gpt-4 graphql langchain llama llama-cpp llamacpp llm openai palm palm2 rest-api vertex-ai
Last synced: 11 Feb 2026
https://github.com/defilantech/llmkube
Kubernetes operator for local LLM inference with llama.cpp, vLLM, and TGI - multi-GPU, autoscaling, air-gapped, production-ready
ai ai-infrastructure apple-silicon autoscaling edge-computing gguf gpu homelab inference kubernetes kubernetes-operator llama-cpp llm local-llm metal mlops multi-gpu nvidia self-hosted vllm
Last synced: 13 Jun 2026
https://github.com/greynewell/musegpt
Local LLMs in your DAW!
ai ai-music daw juce juce-plugins llama-cpp llamacpp llm music-production vst vst-plugin vst3
Last synced: 23 Oct 2025
https://github.com/hadihonarvar/flock
Self-hosted LLM gateway. One Go binary turns your Macs and Linux boxes into a private inference cluster — multi-machine routing, sharding via llama.cpp-RPC, per-user keys + quotas + audit, OpenAI- and Anthropic-compatible APIs behind one endpoint. Point Cursor / Claude Code / Aider / SDKs at it.
ai-gateway aider anthropic claude-code cursor gguf golang inference llama-cpp llm local-llm mlx multi-tenant ollama openai-compatible opentelemetry prometheus self-hosted sharded-inference vllm
Last synced: 11 Jun 2026
https://github.com/arkanefans/servllama
1-Click LLM Server on Your Phone — no Termux needed! 无需Termux,一键让你的手机变成LLM服务器!
android chatbot flutter llama llama-cpp llama-cpp-ui llama-server llm ollama web-ui
Last synced: 03 Jun 2026
https://github.com/countzero/windows_llama.cpp
PowerShell automation to rebuild llama.cpp for a Windows environment.
cmake conda cuda llama-cpp openblas powershell windows
Last synced: 26 Apr 2026
https://github.com/tinybiggames/lumina
Local Generative AI
gen-ai gguf llama-cpp llm-inference local-ai pascal win64 windows-10 windows-11
Last synced: 19 Aug 2025
https://github.com/stampby/halo-ai-core
Bare-metal AI platform for AMD Strix Halo. One script. Everything works. Lego blocks — snap in what you need.
agent-framework ai amd arch-linux bare-metal caddy gaia gpu inference lemonade llama-cpp local-ai privacy rocm ryzen-ai self-hosted strix-halo systemd
Last synced: 18 Apr 2026
https://github.com/hyparam/hyllama
llama.cpp gguf file parser for javascript
gguf javascript js llama-cpp llamacpp llm machine-learning ml parser
Last synced: 17 Mar 2025
https://github.com/opencsgs/llm-inference
llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.
deepspeed llama-cpp llm-inference ray transformer vllm
Last synced: 12 Apr 2025
https://github.com/Lizonghang/prima.cpp
prima.cpp: Speeding up 70B-scale LLM inference on low-resource everyday home clusters
distributed-ai llama-cpp llm-inference on-device-llms
Last synced: 23 Apr 2025
https://github.com/runedgeai/agents-cpp-sdk
A high performance C++ SDK for AI Agents
agentic-ai agents agents-sdk ai-agents ai-agents-framework anthropic artificial-intelligence bazel cpp edge-ai edge-ai-agents gemini generative-ai llama-cpp llm local-ai ollama openai sdk
Last synced: 27 Jan 2026
https://github.com/pranavkumaarofficial/nlcli-wizard
Natural language control for Python CLI tools using locally-trained SLMs (CPU inference)
cli-tools fine-tuning gemma llama-cpp llm local-first machine-learning nlp qlora quantization slm unsloth
Last synced: 07 Mar 2026
https://github.com/off-grid-ai/off-grid-ai-desktop
Off Grid AI — private, on-device AI. Run open models (text, vision, image, voice) locally through one OpenAI-compatible gateway. No cloud, no accounts, no API keys.
electron gguf image-generation llama-cpp llm local-ai local-first local-llm mcp offline on-device openai-compatible privacy react stable-diffusion typescript whisper
Last synced: 30 Jun 2026
https://github.com/dvcdsys/code-index
Semantic code search powered by embeddings. CLI, web dashboard, and AI-agent tooling — search code by meaning, not text. Self-hosted, Go server with embedded llama.cpp.
agent-tools ai-agents claude-code claude-code-marketplace claude-code-plugin cli code-index code-navigation code-search cuda developer-tools embeddings gguf golang llama-cpp rag self-hosted semantic-search tree-sitter vector-search
Last synced: 02 Jul 2026
https://github.com/tobocop2/lilbee
Run local AI models, search your files and code, and crawl the web, all in one program. Cited answers, local-first, with an MCP server for your coding agent. TUI, CLI, REST API, and Python library.
ai-agents cli embeddings gguf huggingface llama-cpp lm-studio local-ai local-llm mcp model-context-protocol ollama privacy python rag retrieval-augmented-generation self-hosted semantic-search tui vector-search
Last synced: 12 Jun 2026
https://github.com/shakfu/cyllama
A thin cython wrapper around llama.cpp, whisper.cpp and stable-diffusion.cpp
cython cython-wrapper llama-cpp python3 stable-diffusion-cpp whisper-cpp
Last synced: 02 Apr 2026
https://github.com/brutalcoding/llama_dart
Flutter / Dart bindings for llama.cpp
dart flutter llama-cpp llama-dart shady-ai
Last synced: 17 Sep 2025
https://github.com/kocort/kocort
Desktop AI agent runtime with dual-brain safety review, GUI-first operation, local model support, and multi-channel delivery.
agent-runtime ai-agent ai-agents desktop-ai desktop-app golang llama-cpp local-first local-llm multi-agent nextjs openai-compatible slack-bot subagents task-scheduler telegram-bot tool-calling webhook workflow-automation
Last synced: 12 Apr 2026
https://github.com/itzderock/llama-playground
A simple to use and powerful web-interface to mess around with Meta's LLaMA LLM.
llama llama-cpp llama-inference-server llamacpp nextjs trpc
Last synced: 29 Oct 2025
https://github.com/rhinodevel/mt_llm
Pure C wrapper library to use llama.cpp with Linux and Windows as simple as possible.
Last synced: 06 Oct 2025
https://github.com/andersondanieln/hexllama
A beautifully crafted desktop client for running and managing local LLMs via llama.cpp.
ai-assistant gguf llama llama-cpp llm local-ai machine-learning
Last synced: 24 May 2026
https://github.com/robjsliwa/llama-agent
Fun project to run your own LLM chat bot using llama.cpp
agents ai langchain langchain-python llama llama-cpp llamacpp llm
Last synced: 14 Jul 2025
https://github.com/hec-ovi/gamentic
Self-hosted, browser-based AI dungeon RPG, fully local on an AMD Strix Halo APU: narrator + per-character agents on one local LLM (Gemma 4 26B MoE via llama.cpp/Vulkan), local images (FLUX.2 klein/ComfyUI), expressive voice (Maya1 TTS). FastAPI brain, SQLite, vanilla JS, docker-compose with guided setup (CLI wizard or double-click HTML).
agentic ai-dungeon ai-rpg amd-strix-halo comfyui docker-compose fastapi flux gemma llama-cpp llm local-llm maya1 mixture-of-experts no-build rocm self-hosted text-to-speech vanilla-javascript vulkan
Last synced: 21 Jun 2026
https://github.com/sunayhegde2006/air.rs
Air.rs 70B+ inference on consumer GPU, LLM inference in Rust
apple-silicon ggml inference instruction-set kernel llama-cpp local-ai lora megakernel nvidia-cuda open-models open-source qlora
Last synced: 28 May 2026
https://github.com/jgoy-labs/server-nexe
Local AI server with persistent memory, RAG, and multi-backend inference (MLX / llama.cpp / Ollama). Runs entirely on your machine — zero data sent to external services.
ai apple-silicon embeddings fastapi llama-cpp llm local-ai mlx ollama open-source privacy python qdrant rag self-hosted vector-database
Last synced: 08 Jun 2026
https://github.com/acai66/qwen2.5_numpy
使用numpy实现DeepSeek-R1-Distill-Qwen-1.5B的推理过程,易于学习LLM推理与移植到其它编程语言加速。 Implementing the inference process of DeepSeek-R1-Distill-Qwen-1.5B using numpy, making it easy to learn LLM (Large Language Model) inference and to port to other programming languages for acceleration.
deepseek deepseek-r1 llama-cpp llm-inference numpy qwen qwen2
Last synced: 22 Apr 2025
https://github.com/rudolfolah/metatron
Metatron is a project that brings together whisper.cpp, llama.cpp, and piper into a deployable stack with an awesome Node.js API wrapper for each of them.
dockerized llama-cpp llamacpp piper whisper-cpp whispercpp
Last synced: 28 Aug 2025
https://github.com/nonatofabio/luna-agent
Custom minimal AI agent with persistent memory, MCP tools, and Discord
agent-framework ai-agent discord-bot homelab llama-cpp llm local-llm mcp openai-compatible python sqlite vector-search
Last synced: 04 Apr 2026
https://github.com/zoott28354/sirius-ai-tray-assistant
Local desktop AI tray assistant for screenshots, translations, image analysis and persistent chat with local backends.
ai-assistant desktop-app lan llama-cpp llm lm-studio local-ai local-llm local-network multimodal ollama openai-compatible pyqt6 python screenshot self-hosted sqlite translation tray-application
Last synced: 31 May 2026
https://github.com/ukkit/chat-o-llama
🦙 chat-o-llama: A lightweight, modern web interface for AI conversations with support for both Ollama and llama.cpp backends. Features persistent conversation management, real-time backend switching, intelligent context compression, and a clean responsive UI.
ai-chat chat chat-interface chat-o-llama chatbot conversation-history cpu-only developer-tools flask lightweight llama-cpp llamacpp local-ai offline-ai ollama privacy-focused python self-hosted sqlite
Last synced: 12 Jun 2026
https://github.com/kevinknights29/llama-v2-gpu-gtx-1650
Running Llama v2 with Llama.cpp in a 4GB VRAM GTX 1650.
Last synced: 23 Apr 2025
https://github.com/ethicals7s/awesome-local-ai
152 open-source tools to run LLMs 100% locally – no cloud, no API keys, no censorship
awesome-list crewai exllama fine-tuning inference llama-cpp local-ai local-llm machine-learning-ai multi-modal offline-ai ollama private-gpt quantization rag-agents self-hosted vllm voice
Last synced: 24 Dec 2025
https://github.com/mili-tan/onllama.gguflinkout
Create out symbolic links for the GGUF Models in Ollama Blobs. for use in other applications such as Llama.cpp/Jan/LMStudio etc. / 将 Ollama GGUF 模型文件软链接出,以便其他应用使用。
gguf gguf-models jan llama-cpp llamacpp lmstudio ollama
Last synced: 12 Apr 2025
https://github.com/rafaelpierre/openai-agents-redis
Native OpenAI Agents SDK session management implementation using Redis as the persistence layer.
agents artificial-intelligence llama llama-cpp multiagent-systems ollama openai openai-agents-sdk redis
Last synced: 06 Aug 2025
https://github.com/croll83/llama.cpp-dgx
llama.cpp fork optimized for NVIDIA DGX Spark / GB10 (Blackwell, SM 12.1) — TurboQuant weights + KV, NVFP4, DFlash MTP
blackwell dflash gb10 llama-cpp nvfp4 speculative-decoding turboquant
Last synced: 02 Jun 2026
https://github.com/mycellm/mycellm
Distributed LLM inference across heterogeneous hardware. Pool GPUs into a P2P network with QUIC transport, Ed25519 identity, and an OpenAI-compatible API.
decentralized distributed-computing fleet-management gpu inference llama-cpp llm machine-learning openai-api peer-to-peer python quic self-hosted
Last synced: 10 Jun 2026
https://github.com/statikfintechllc/godcore
All-in-one local AI stack for Mistral-13B and Llama.cpp, with one-step CUDA wheel install, OpenAI-compatible API, and modern web dashboard. Switch between local and cloud chat, run on your own GPU, and deploy instantly—no API keys or paywalls. Designed for easy install, custom builds, and fast remote access. Enjoy!
ai chatbot chatgpt cuda dashboard fastapi llama-cpp llm local-ai mistral openai-compatible react selfhosted webui
Last synced: 25 Jun 2025
https://github.com/1038lab/ComfyUI-MiniCPM
A ComfyUI custom node for MiniCPM vision-language models, enabling high-quality image captioning and analysis.
comfyui custom-nodes gguf llama-cpp minicpm minicpm-v muti-models stable-diffusion
Last synced: 02 Sep 2025
https://github.com/sullygreene/tinyagi
TinyAGI is a lightweight, modular, and extensible Python-based AGI framework designed to create and manage AI agents seamlessly. It supports various model backends like OpenAI, Llama.cpp, Ollama, AlpacaX, and Tabitha, along with dynamic plugin loading for enhanced flexibility.
agents agi ai api artificial-intelligence cli developer-tools extensible framework llama-cpp machine-learning modular ollama openai plugins python task-automation
Last synced: 06 Oct 2025
https://github.com/alichherawalla/offline-mobile-llm-manager
The Swiss Army Knife of Offline AI. Chat, Speak, and Generate Images—Privacy First, Zero Internet. Download an LLM and use it on your mobile device. No data ever leaves your phone. Privacy first. Supports text-to-text, vision, text-to-image
edge-ai edge-ai-image-gen gguf llama-3-android llama-cpp llm local-ai local-image-gen offline offline-image-gen offline-llm privacy-first stable-diffusion-android whisper whisper-android
Last synced: 12 Feb 2026
https://github.com/countzero/windows_manage_large_language_models
PowerShell automation to download large language models (LLMs) from Git repositories and quantize them with llama.cpp into the GGUF format.
gguf git large-language-models lfs llama-cpp powershell quantization windows
Last synced: 26 Apr 2026
https://github.com/eniompw/llama-cpp-gpu
Load larger models by offloading model layers to both GPU and CPU
colab colab-notebook gpu gpu-acceleration llama llama-cpp llamacpp
Last synced: 05 May 2026
https://github.com/john-rocky/apple-silicon-llm-bench
Neutral, reproducible benchmark for local LLMs on Apple Silicon (Mac · iPhone · iPad) — MLX, llama.cpp, CoreML, Apple Foundation Models
apple-silicon benchmark coreml ios llama-cpp llm llm-inference macos mlx on-device-ai
Last synced: 31 May 2026
https://github.com/slb350/octoroute
Smart HTTP router for local LLMs (Ollama, LM Studio, llama.cpp). Rule-based + LLM-powered routing, health checks, load balancing, Prometheus metrics. Rust-native, zero-overhead.
ai artificial-intelligence homelab llama-cpp llm lm-studio local-llm ollama prometheus rust
Last synced: 13 Jan 2026
https://github.com/haschka/cli-rag
Command line tool to Interact with a llama.cpp server. Also implements a basic vector database with cosine similarity search.
artificial-intelligence cli large-language-models llama-cpp llm unix-shell
Last synced: 18 Jun 2026
https://github.com/jwinman91/ai-ocr
An AI-powered, but model-agnostic (Optical-Character-Recognition) OCR tool
genai image-to-plot-generation image-to-text-generation llama-cpp ocr-python ocr-recognition python3
Last synced: 21 May 2026
https://github.com/arcxteam/gguf-convert-model
Auto GGUF Converter for HuggingFace Hub Models with Multiple Quantizations (GGUF Format)
ai ai-models bf16 cmake convert-gguf gguf gguf-editor gguf-models gguf-quantization huggingface huggingface-models llama-cpp machine-learning safetensors tensorflow transformers
Last synced: 10 Jun 2026
https://github.com/hanxiao/knowledge-graph-extractor
Turn any document or a whole zip into an interactive knowledge graph, using a self-hosted Qwen3.6-35B-A3B-MTP on a single NVIDIA L4
fastapi force-graph gpu information-extraction knowledge-graph llama-cpp llm qwen self-hosted
Last synced: 15 Jun 2026
https://github.com/kaust-generative-ai/local-deployment-of-generative-ai-models
Training materials on how to deploy generative AI models locally on your laptop or workstation.
ai carpentries-incubator deployment english generative-ai lesson llama-cpp llamafile llm-inference ollama pre-alpha python
Last synced: 17 Aug 2025
https://github.com/mutgarth/automode
Auto-approve Claude Code, Codex, and Antigravity CLI permission prompts using a local LLM — zero UI interruptions, full reasoning, ~500ms decisions
ai antigravity automation claude claude-code codex developer-tools llama-cpp llm openai-codex rust
Last synced: 30 May 2026
https://github.com/tvanfossen/entropic
Local-first agentic inference engine in C/C++. Multi-tier model routing, grammar-constrained output, MCP tool servers. Embeddable via C ABI.
agentic-ai agentic-framework cpp cpp20 cuda edge-ai embedded-ai gbnf gguf grammar-constrained-decoding inference-engine llama-cpp llm local-llm mcp on-device-ai privacy-first tool-calling
Last synced: 30 May 2026
https://github.com/fajknli/palacelite
Rooted. Reflecting.
ai ai-memory ai-tools llama-cpp local-llm memgpt memory memory-palace offline-ai open-source retrieval-augmented-generation sqlite-vec
Last synced: 18 Apr 2026
https://github.com/noosxe/llama-launcher
Quick launcher for running LLMs with llama-server containers
Last synced: 31 May 2026
https://github.com/shekharp1536/ollama-web
Ollama Web UI is a simple yet powerful web-based interface for interacting with large language models. It offers chat history, voice commands, voice output, model download and management, conversation saving, terminal access, multi-model chat, and more—all in one streamlined platform.
llama llama-cpp llama3 llm-inference ollama ollama-app ollama-chat ollama-client ollama-gui ollama-interface ollama-python ollama-ui ollama-webui python-llm-integration
Last synced: 06 Oct 2025
https://github.com/hossbit/comai-linux-assistant
Local AI Linux terminal assistant written in Bash. Explain commands, analyze files and logs, and use local LLMs or OpenAI-compatible APIs.
ai-assistant bash bash-script cli command-line-tool devops linux linux-assistant llama-cpp llm local-ai local-llm log-analysis open-source openai openai-api shell-assistant system-administration terminal terminal-assistant
Last synced: 30 Jun 2026