Projects in Awesome Lists tagged with llama-cpp
A curated list of projects in awesome lists tagged with llama-cpp .
https://github.com/getumbrel/llama-gpt
A self-hosted, offline, ChatGPT-like chatbot. Powered by Llama 2. 100% private, with no data leaving your device. New: Code Llama support!
ai chatgpt code-llama codellama gpt gpt-4 gpt4all llama llama-2 llama-cpp llama2 llamacpp llm localai openai self-hosted
Last synced: 13 May 2025
https://github.com/scisharp/llamasharp
A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.
chatbot gpt llama llama-cpp llama2 llama3 llamacpp llava llm multi-modal semantic-kernel
Last synced: 14 May 2025
https://github.com/SciSharp/LLamaSharp
A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.
chatbot gpt llama llama-cpp llama2 llama3 llamacpp llava llm multi-modal semantic-kernel
Last synced: 24 Mar 2025
https://github.com/mobile-artificial-intelligence/maid
Maid is a cross-platform Flutter app for interfacing with GGUF / llama.cpp models locally, and with Ollama and OpenAI models remotely.
android android-ai chatbot chatgpt facebook flutter free-chatgpt gguf large-language-models llama llama-cpp llama2 llamacpp local-ai mistral mobile-ai mobile-artificial-intelligence ollama openai openorca
Last synced: 11 Apr 2025
https://github.com/Mobile-Artificial-Intelligence/maid
Maid is a cross-platform Flutter app for interfacing with GGUF / llama.cpp models locally, and with Ollama and OpenAI models remotely.
android android-ai chatbot chatgpt facebook flutter free-chatgpt gguf large-language-models llama llama-cpp llama2 llamacpp local-ai mistral mobile-ai mobile-artificial-intelligence ollama openai openorca
Last synced: 24 Mar 2025
https://github.com/withcatai/node-llama-cpp
Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema on the model output on the generation level
ai bindings catai cmake cmake-js cuda embedding function-calling gguf gpu grammar json-schema llama llama-cpp llm metal nodejs prebuilt-binaries self-hosted vulkan
Last synced: 13 May 2025
https://github.com/undreamai/LLMUnity
Create characters in Unity with LLMs!
ai character chat chatbot conversational-ai dialogue game-development gamedev generative-ai llama llama-cpp llm npc rag unity unity2d unity3d
Last synced: 07 May 2025
https://github.com/mybigday/llama.rn
React Native binding of llama.cpp
android ios llama llama-cpp llm react-native
Last synced: 14 May 2025
https://github.com/withcatai/catai
Run AI ✨ assistant locally! with simple API for Node.js 🚀
ai ai-assistant catai chatbot chatgpt chatui dalai ggmlv3 gguf llama-cpp llm local-llm localai node-llama-cpp nodejs openai vicuna vicuna-installation-guide wizardlm
Last synced: 07 Oct 2025
https://github.com/the-crypt-keeper/can-ai-code
Self-evaluating interview for AI coders
ai ggml humaneval langchain llama-cpp llm transformers
Last synced: 05 Apr 2025
https://github.com/mdrokz/rust-llama.cpp
LLama.cpp rust bindings
api-bindings cpp crates-io ffi llama llama-cpp machine-learning model rust
Last synced: 16 May 2025
https://github.com/jlonge4/local_llama
This repo is to showcase how you can run a model locally and offline, free of OpenAI dependencies.
artificial-intelligence langchain llama-cpp llamaindex machinelearning offline python
Last synced: 03 Apr 2025
https://github.com/ptsochantaris/emeltal
Local ML voice chat using high-end models.
ai llama-cpp machine-learning macos ml natural-language-processing speech-recognition swift swiftui user-interface whisper-cpp
Last synced: 05 Apr 2025
https://github.com/phronmophobic/llama.clj
Run LLMs locally. A clojure wrapper for llama.cpp.
Last synced: 09 Apr 2025
https://github.com/gpustack/gguf-parser-go
Review/Check GGUF files and estimate the memory usage and maximum tokens per second.
gguf go llama-box llama-cpp stable-diffusion-cpp
Last synced: 19 Apr 2025
https://github.com/brutalcoding/shady.ai
Making offline AI models accessible to all types of edge devices.
android cross-platform dart fastlane flutter gguf ios linux linux-desktop llama-cpp llama-dart llvm macos material-design rwkv serverpod shady-ai web whisper-cpp windows
Last synced: 09 Apr 2025
https://github.com/BrutalCoding/shady.ai
Making offline AI models accessible to all types of edge devices.
android cross-platform dart fastlane flutter gguf ios linux linux-desktop llama-cpp llama-dart llvm macos material-design rwkv serverpod shady-ai web whisper-cpp windows
Last synced: 11 Apr 2025
https://github.com/nuance1979/llama-server
LLaMA Server combines the power of LLaMA C++ with the beauty of Chatbot UI.
chatbot-ui llama llama-cpp llamacpp
Last synced: 12 May 2025
https://github.com/nrl-ai/customchar
Your customized AI assistant - Personal assistants on any hardware! With llama.cpp, whisper.cpp, ggml, LLaMA-v2.
cpp ggml llama llama-cpp llama-v2 llm stt tts whisper-cpp
Last synced: 25 Aug 2025
https://github.com/vtuber-plan/langport
Langport is a language model inference service
api chatgpt chatgpt-api fauxpilot langchain language-model llama llama-cpp llm openai tabby
Last synced: 30 Jun 2025
https://github.com/abhi5h3k/privatedocbot
📚 Local PDF-Integrated Chat Bot: Secure Conversations and Document Assistance with LLM-Powered Privacy
ai chatgpt generative gpt gpt-4 gpt4all langchain llama llama-2 llama-cpp llama2 llamacpp llm localai openai pdf private privategpt self-hosted vectorstore
Last synced: 05 Oct 2025
https://github.com/r3gm/insightsolver-colab
InsightSolver: Colab notebooks for exploring and solving operational issues using deep learning, machine learning, and related models.
ai-ops aiops autogpt colab-notebook colorization computer-vision deep-learning llama-2 llama-cpp llm machine-learning object-detection stable-diffusion text-to-speech
Last synced: 12 Oct 2025
https://github.com/Abhi5h3k/PrivateDocBot
📚 Local PDF-Integrated Chat Bot: Secure Conversations and Document Assistance with LLM-Powered Privacy
ai chatgpt generative gpt gpt-4 gpt4all langchain llama llama-2 llama-cpp llama2 llamacpp llm localai openai pdf private privategpt self-hosted vectorstore
Last synced: 07 Apr 2025
https://github.com/zhouwg/kantv
workbench for learing&practising AI tech in real scenario on Android device, powered by GGML(Georgi Gerganov Machine Learning) and NCNN(Nihui Convolutional Neural Network) and FFmpeg + OpenCV
edge-ai ffmpeg ffmpeg-android livetv llama-cpp ncnn-android whisper-cpp
Last synced: 04 Apr 2025
https://github.com/fboulnois/llama-cpp-docker
Run llama.cpp in a GPU accelerated Docker container
chatgpt docker docker-compose llama llama-cpp llama2 llama3 llm mistral
Last synced: 03 Oct 2025
https://github.com/aj-archipelago/cortex
Simplify and accelerate AI-powered application development with structured interfaces to models and powerful prompt execution environments.
ai chatgpt gpt-3 gpt-35-turbo gpt-4 graphql langchain llama llama-cpp llamacpp llm openai palm palm2 rest-api vertex-ai
Last synced: 24 Dec 2025
https://github.com/greynewell/musegpt
Local LLMs in your DAW!
ai ai-music daw juce juce-plugins llama-cpp llamacpp llm music-production vst vst-plugin vst3
Last synced: 23 Oct 2025
https://github.com/tinybiggames/lumina
Local Generative AI
gen-ai gguf llama-cpp llm-inference local-ai pascal win64 windows-10 windows-11
Last synced: 19 Aug 2025
https://github.com/opencsgs/llm-inference
llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.
deepspeed llama-cpp llm-inference ray transformer vllm
Last synced: 12 Apr 2025
https://github.com/hyparam/hyllama
llama.cpp gguf file parser for javascript
gguf javascript js llama-cpp llamacpp llm machine-learning ml parser
Last synced: 17 Mar 2025
https://github.com/Lizonghang/prima.cpp
prima.cpp: Speeding up 70B-scale LLM inference on low-resource everyday home clusters
distributed-ai llama-cpp llm-inference on-device-llms
Last synced: 23 Apr 2025
https://github.com/brutalcoding/llama_dart
Flutter / Dart bindings for llama.cpp
dart flutter llama-cpp llama-dart shady-ai
Last synced: 17 Sep 2025
https://github.com/itzderock/llama-playground
A simple to use and powerful web-interface to mess around with Meta's LLaMA LLM.
llama llama-cpp llama-inference-server llamacpp nextjs trpc
Last synced: 29 Oct 2025
https://github.com/rhinodevel/mt_llm
Pure C wrapper library to use llama.cpp with Linux and Windows as simple as possible.
Last synced: 06 Oct 2025
https://github.com/robjsliwa/llama-agent
Fun project to run your own LLM chat bot using llama.cpp
agents ai langchain langchain-python llama llama-cpp llamacpp llm
Last synced: 14 Jul 2025
https://github.com/acai66/qwen2.5_numpy
使用numpy实现DeepSeek-R1-Distill-Qwen-1.5B的推理过程,易于学习LLM推理与移植到其它编程语言加速。 Implementing the inference process of DeepSeek-R1-Distill-Qwen-1.5B using numpy, making it easy to learn LLM (Large Language Model) inference and to port to other programming languages for acceleration.
deepseek deepseek-r1 llama-cpp llm-inference numpy qwen qwen2
Last synced: 22 Apr 2025
https://github.com/rudolfolah/metatron
Metatron is a project that brings together whisper.cpp, llama.cpp, and piper into a deployable stack with an awesome Node.js API wrapper for each of them.
dockerized llama-cpp llamacpp piper whisper-cpp whispercpp
Last synced: 28 Aug 2025
https://github.com/kevinknights29/llama-v2-gpu-gtx-1650
Running Llama v2 with Llama.cpp in a 4GB VRAM GTX 1650.
Last synced: 23 Apr 2025
https://github.com/tinybiggames/jetinfero
Local LLM Inference Library
ai-inference c-cpp library llama-cpp local-inference pascal procedural-api win64
Last synced: 11 Mar 2025
https://github.com/ethicals7s/awesome-local-ai
152 open-source tools to run LLMs 100% locally – no cloud, no API keys, no censorship
awesome-list crewai exllama fine-tuning inference llama-cpp local-ai local-llm machine-learning-ai multi-modal offline-ai ollama private-gpt quantization rag-agents self-hosted vllm voice
Last synced: 24 Dec 2025
https://github.com/rafaelpierre/openai-agents-redis
Native OpenAI Agents SDK session management implementation using Redis as the persistence layer.
agents artificial-intelligence llama llama-cpp multiagent-systems ollama openai openai-agents-sdk redis
Last synced: 06 Aug 2025
https://github.com/mili-tan/onllama.gguflinkout
Create out symbolic links for the GGUF Models in Ollama Blobs. for use in other applications such as Llama.cpp/Jan/LMStudio etc. / 将 Ollama GGUF 模型文件软链接出,以便其他应用使用。
gguf gguf-models jan llama-cpp llamacpp lmstudio ollama
Last synced: 12 Apr 2025
https://github.com/sullygreene/tinyagi
TinyAGI is a lightweight, modular, and extensible Python-based AGI framework designed to create and manage AI agents seamlessly. It supports various model backends like OpenAI, Llama.cpp, Ollama, AlpacaX, and Tabitha, along with dynamic plugin loading for enhanced flexibility.
agents agi ai api artificial-intelligence cli developer-tools extensible framework llama-cpp machine-learning modular ollama openai plugins python task-automation
Last synced: 06 Oct 2025
https://github.com/1038lab/ComfyUI-MiniCPM
A ComfyUI custom node for MiniCPM vision-language models, enabling high-quality image captioning and analysis.
comfyui custom-nodes gguf llama-cpp minicpm minicpm-v muti-models stable-diffusion
Last synced: 02 Sep 2025
https://github.com/statikfintechllc/godcore
All-in-one local AI stack for Mistral-13B and Llama.cpp, with one-step CUDA wheel install, OpenAI-compatible API, and modern web dashboard. Switch between local and cloud chat, run on your own GPU, and deploy instantly—no API keys or paywalls. Designed for easy install, custom builds, and fast remote access. Enjoy!
ai chatbot chatgpt cuda dashboard fastapi llama-cpp llm local-ai mistral openai-compatible react selfhosted webui
Last synced: 25 Jun 2025
https://github.com/eniompw/llama-cpp-gpu
Load larger models by offloading model layers to both GPU and CPU
colab colab-notebook gpu gpu-acceleration llama llama-cpp llamacpp
Last synced: 27 Mar 2025
https://github.com/shekharp1536/ollama-web
Ollama Web UI is a simple yet powerful web-based interface for interacting with large language models. It offers chat history, voice commands, voice output, model download and management, conversation saving, terminal access, multi-model chat, and more—all in one streamlined platform.
llama llama-cpp llama3 llm-inference ollama ollama-app ollama-chat ollama-client ollama-gui ollama-interface ollama-python ollama-ui ollama-webui python-llm-integration
Last synced: 06 Oct 2025
https://github.com/kaust-generative-ai/local-deployment-of-generative-ai-models
Training materials on how to deploy generative AI models locally on your laptop or workstation.
ai carpentries-incubator deployment english generative-ai lesson llama-cpp llamafile llm-inference ollama pre-alpha python
Last synced: 17 Aug 2025
https://github.com/n-engine/devit
Rust CLI dev agent — patch-only, sandboxed, with local LLMs (Ollama/LM Studio).
ai-agent ai-agents approval-policy cli code-generation developer-tools git llama-cpp lm-studio ollama patch-only rust sandbox testing unified-diff wasm
Last synced: 07 Oct 2025
https://github.com/prithivsakthiur/triangulum
Triangulum 10B: Multilingual Large Language Models (LLMs)
10b 1b 5b llama-cpp llama-cpp-python llm ollama text-generation
Last synced: 22 Feb 2025
https://github.com/abhrankan-chakrabarti/llamainteract
An interactive AI platform with both terminal and web-based interfaces for real-time model interactions, featuring dynamic model integration and immediate feedback streaming. Developed by *The Vanguards*.
ai chatbot flask interactive-cli llama llama-cpp llm nlp ollama python real-time-streaming text-generation web-app
Last synced: 17 Jun 2025
https://github.com/kevinknights29/llama_to_llama.cpp
This project aims to create a guide for generating Llama.cpp GGUF model files out of base Llama v2 model weigths
Last synced: 04 Sep 2025
https://github.com/jwinman91/ai-ocr
An AI-powered, but model-agnostic (Optical-Character-Recognition) OCR tool
genai image-to-plot-generation image-to-text-generation llama-cpp ocr-python ocr-recognition python3
Last synced: 13 Mar 2025
https://github.com/runedgeai/agents-sdk
A modern, high performance C++ SDK for AI Agents
agentic-ai agents anthropic artificial-intelligence cpp gemini llama-cpp llm ollama openai sdk
Last synced: 08 Oct 2025
https://github.com/haschka/cli-rag
Command line tool to Interact with a llama.cpp server. Also implements a basic vector database with cosine similarity search.
artificial-intelligence cli large-language-models llama-cpp llm unix-shell
Last synced: 22 Feb 2025
https://github.com/dwain-barnes/llm-gguf-auto-converter
Automated Jupyter notebook solution for batch converting Large Language Models to GGUF format with multiple quantization options. Built on llama.cpp with HuggingFace integration.
auto-converter batch-processing cuda gguf huggingface jupyter-notebook llama-cpp llm model-quantization
Last synced: 17 Jun 2025
https://github.com/icakinser/chatterdocs
This project allows for interacting and chatting with documents locally using a 4bit LLM Models and a flat database.
llama llama-cpp llm llm-inference local machine-learning rag
Last synced: 24 Oct 2025
https://github.com/janole/chat-bandit
The friendly and powerful desktop AI chatbot supporting both local and cloud AI models
chatbot electron llama-cpp llm node-llama-cpp ollama ollama-api openai openai-api
Last synced: 07 Oct 2025
https://github.com/sderosiaux/bifrost-ai
🌈 Local LLM chat with conversation branching, mood-reactive UI, and time travel. Run GPT-OSS 20B locally with chain-of-thought reasoning.
ai branching chain-of-thought chatbot conversation-ai gpt gpt-oss llama-cpp llm local-ai mood-detection nodejs react time-travel typescript
Last synced: 19 Sep 2025
https://github.com/j-sephb-lt-n/happy-rag-friends
I have abandoned this local RAG application for now
flask llama-cpp llm llms rag retrieval-augmented-generation web-app
Last synced: 20 Jun 2025
https://github.com/sawadkk/localprompt
LocalPrompt is an AI-powered tool designed to refine and optimize AI prompts, helping users run locally hosted AI models like Mistral-7B for privacy and efficiency. Ideal for developers seeking to run LLMs locally without external APIs.
ai-development ai-prompt fastapi llama-cpp llm local-ai mistral7b offline-ai open-source-llm self-hosted-ai
Last synced: 14 May 2025
https://github.com/testli-ai/outlines-llama-cpp-python-streaming-output
This repository demonstrates how to use outlines and llama-cpp-python for structured JSON generation with streaming output, integrating llama.cpp for local model inference and outlines for schema-based text generation.
gguf gguf-models llama-cpp llama-cpp-python llamacpp llamacpp-python outlines
Last synced: 06 Mar 2025
https://github.com/sshoecraft/shepherd
An interactive multi-backend LLM runtime with intelligent cache eviction and persistent retrieval-augmented memory.
anthropic cli cpp cuda gemini grok inference kv-cache llama-cpp llm mcp ollama openai openai-server rag smart-evictions tensorrt tool-calling ulimited-context
Last synced: 22 Nov 2025
https://github.com/meliussui/mediassist_ai
Medical LLM app for voice/text input, supporting Hindi and English. Works online with Claude 3.5 and offline with BioGPT. 🏥🤖
agents ai aws biogpt data-model glow-tts hackathon hindi-support hospital-management llama-cpp medical-llm mongoose multilingual ocr open-source react voice-assistant whisper
Last synced: 11 Sep 2025
https://github.com/m4k15y6666fk/llm-in-browser
Using LLM with browser features.
Last synced: 03 Jul 2025
https://github.com/keshavpatel2/local-llm-workbench
🧠 A comprehensive toolkit for benchmarking, optimizing, and deploying local Large Language Models. Includes performance testing tools, optimized configurations for CPU/GPU/hybrid setups, and detailed guides to maximize LLM performance on your hardware.
context-window-scaling cpu-inference cuda gpu-acceleration hybrid-inference inference-optimization llama-cpp llm-benchmarking llm-deployment local-llm model-management model-quantization ollama-optimization wsl-ai-setup
Last synced: 01 Apr 2025
https://github.com/nacholibre22/genai-text-based-rpg-with-npcs
Explore "GenAI-Text-Based-RPG-with-NPCs," a Python game featuring AI-driven NPCs, quests, and rich dialogues. Perfect for RPG enthusiasts! 🎮💻
ai-chat ai-npc interactive-story llama-cpp llm-integration local-ai mistral-7b npc python python-game role-playing-game streamlit text-adventure text-rpg
Last synced: 04 Jul 2025
https://github.com/pranav11024/genai-text-based-rpg-with-npcs
A text-based role-playing game with AI-powered NPC dialogue, built with Python and Mistral 7B via llama.cpp.
ai-chat ai-npc interactive-story llama-cpp llm-integration local-ai mistral-7b npc python python-game role-playing-game rpg streamlit text-adventure text-rpg
Last synced: 19 Jul 2025
https://github.com/lurkydismal/llmux
A lightweight local LLM chat with a web UI and a C‑based server that runs any LLM chat executable as a child and communicates via pipes
c99 chatbot civetweb cpp17 linux-only llama-cpp local-ai offline-llm self-hosted single-session tailwind-css web-ui
Last synced: 17 Jun 2025
https://github.com/4kumon/ollama-chatbot-cli
Este projeto cria um chatbot local em linha de comando, utilizando o modelo quantizado (q4_0) gemma3 via Ollama. Ele oferece uma base sólida para construir interfaces conversacionais sem depender de APIs externas, ideal para aplicações locais, ambientes offline ou situações onde a privacidade dos dados é essencial.
llama-cpp llms ollama python slms
Last synced: 23 Apr 2025
https://github.com/nimadez/cli
Debian Assistant CLI (private)
command-line debian gnome installation linux linux-kernel llama-cpp monitoring networking openbox script-collection security sway trixie wayland x11
Last synced: 09 Sep 2025
https://github.com/ziweek/award-factory
🎓 Showcasing Project, in 2024 Google Machine Learning Bootcamp - 🏆🤖 Award-Factory: Awards lovingly crafted for you by a hilariously talented generative AI! #Google #Gemma:2b #fine-tuning #quantization
docker docker-compose fastapi fine-tuning gemma-2b google large-language-model llama-cpp nextjs quantization
Last synced: 05 Apr 2025
https://github.com/ezforever/llama.cpp-static
Static builds of llama.cpp (Currently only amd64 server builds are available)
llama llama-cpp llamacpp localai self-hosted
Last synced: 06 Sep 2025
https://github.com/ziweek/two-armies-chat-once
💼 Work Project - 🤖🪖 A Korean-English bilingual RAG Chatbot for Regulations of US Army and ROK Army, leveraging a PEFT fine-tuned small LLM with 4-bit quantized integration as the translator
huggingface langchain langsmith llama-cpp llama3 ollama quantization retrieval-augmented-generation streamlit transformers
Last synced: 05 Apr 2025