{"id":48590403,"url":"https://github.com/roycoding8/llama-gradio-ui","last_synced_at":"2026-04-08T19:03:15.643Z","repository":{"id":346922918,"uuid":"1191811327","full_name":"RoyCoding8/llama-gradio-ui","owner":"RoyCoding8","description":"Local-first Gradio 6 chat UI for llama.cpp with MCP tool-calling and an optional Presidio-based privacy shield.","archived":false,"fork":false,"pushed_at":"2026-03-26T01:05:51.000Z","size":28,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-26T23:15:22.742Z","etag":null,"topics":["llama","llm","local-llm","mcp"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/RoyCoding8.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-25T15:56:47.000Z","updated_at":"2026-03-26T01:05:54.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/RoyCoding8/llama-gradio-ui","commit_stats":null,"previous_names":["roycoding8/llama-gradio-ui"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/RoyCoding8/llama-gradio-ui","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RoyCoding8%2Fllama-gradio-ui","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RoyCoding8%2Fllama-gradio-ui/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RoyCoding8%2Fllama-gradio-ui/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RoyCoding8%2Fllama-gradio-ui/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/RoyCoding8","download_url":"https://codeload.github.com/RoyCoding8/llama-gradio-ui/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RoyCoding8%2Fllama-gradio-ui/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31569400,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-08T14:31:17.711Z","status":"ssl_error","status_checked_at":"2026-04-08T14:31:17.202Z","response_time":54,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["llama","llm","local-llm","mcp"],"created_at":"2026-04-08T19:02:51.436Z","updated_at":"2026-04-08T19:03:15.634Z","avatar_url":"https://github.com/RoyCoding8.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# llama-gradio-ui\n\nA local chat UI for `llama.cpp` server with Gradio 6, MCP tool calling, and optional privacy processing.\n\n## What You Can Do\n\n- Start and stop `llama-server` from the UI and load models from a GGUF folder or custom path\n- Stream chat responses from the OpenAI-compatible `/v1/chat/completions` endpoint\n- Connect MCP servers over stdio, SSE, or HTTP and let the model call tools\n- Run a two-step privacy flow: redact PII with Presidio, then restyle text locally\n\n## Requirements\n\n- Python 3.10+\n- `llama-server` / `llama-server.exe` from [llama.cpp](https://github.com/ggerganov/llama.cpp)\n- At least one GGUF model file\n- [uv](https://github.com/astral-sh/uv) (recommended) or pip\n\nThis repository is a `uv` project (`pyproject.toml` + `uv.lock`).\n\n## Quick Start\n\n1. Clone and enter the project\n\n   ```bash\n   git clone https://github.com/RoyCoding8/llama-gradio-ui.git\n   cd llama-gradio-ui\n   ```\n\n2. Create `.env` from the example\n\n   ```bash\n   cp .env.example .env\n   ```\n\n3. Update `.env` with your local paths and defaults\n\n   ```env\n   LLAMA_SERVER_DIR=C:\\path\\to\\llama-cpp-build\n   GGUF_DIR=C:\\path\\to\\models\n\n   GPU_LAYERS=-1\n   CTX_SIZE=4096\n   KV_CACHE_TYPE_K=f16\n   KV_CACHE_TYPE_V=f16\n   ```\n\n4. Install dependencies\n\n   ```bash\n   uv sync\n   ```\n\n   Or with pip:\n\n   ```bash\n   pip install -e .\n   python -m spacy download en_core_web_lg\n   ```\n\n5. Run the app\n\n   ```bash\n   uv run python app.py\n   ```\n\n   On Windows you can also use `start.bat`.\n\nBy default, the UI is available at `http://127.0.0.1:7860`.\n\n## Configuration\n\nKey values in `.env`:\n\n- `LLAMA_HOST`, `LLAMA_PORT`: target `llama-server` host and port\n- `LLAMA_SERVER_DIR`: directory that contains `llama-server`\n- `GGUF_DIR`: directory scanned for `.gguf` models\n- `UI_HOST`, `UI_PORT`, `UI_SHARE`: Gradio host, port, and public share mode\n- `CTX_SIZE`, `GPU_LAYERS`: default runtime settings for `llama-server`\n- `KV_CACHE_TYPE_K`, `KV_CACHE_TYPE_V`: KV cache quantization settings\n- `ALLOW_REMOTE_TOOLS`: when `UI_SHARE=1`, set this to `1` only if you explicitly want remote tool execution\n\n## MCP Server Setup\n\nYou can add MCP servers in the UI or edit `mcp_servers.json` directly.\n\n```json\n{\n  \"servers\": {\n    \"filesystem\": {\n      \"name\": \"filesystem\",\n      \"transport\": \"stdio\",\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"@modelcontextprotocol/server-filesystem\", \"C:/docs\"],\n      \"enabled\": true,\n      \"autostart\": false\n    }\n  }\n}\n```\n\nThe MCP tab also accepts Claude/Cursor-format imports.\n\n## Tool-Calling Flow\n\n1. The model receives chat history plus OpenAI-format tool schemas from connected MCP servers\n2. If it emits tool calls, the app dispatches them to MCP and records results\n3. Tool results are appended to conversation context\n4. The model runs again, up to 5 rounds\n5. Final output streams back to the chat UI\n\n## Thinking Mode Notes\n\n- The Chat tab `Think` toggle is forwarded to llama.cpp on each request using:\n  - `reasoning: \"on\" | \"off\"`\n  - `reasoning_budget: -1 | 0`\n  - `chat_template_kwargs: {\"enable_thinking\": true | false}`\n- Some Qwen3.5 + llama.cpp builds have known upstream issues where thinking control can be inconsistent.\n- If `Think: OFF` still hangs or emits reasoning on your build, update llama.cpp and prefer server startup with explicit reasoning flags (for example `--reasoning off`).\n\n## Project Structure\n\n| File | Purpose |\n|---|---|\n| `app.py` | Application entry point and Gradio wiring |\n| `server_runtime.py` | `llama-server` process lifecycle and model discovery |\n| `chat_engine.py` | Streaming chat and MCP tool-call loop |\n| `mcp_manager.py` | Async MCP client manager and server connections |\n| `mcp_facade.py` | UI-facing MCP actions and response formatting |\n| `privacy_shield.py` | PII redaction and local restyling flow |\n| `config.py` | Environment and `.env` parsing |\n| `style.css` | UI styling |\n\n## License\n\nApache 2.0\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froycoding8%2Fllama-gradio-ui","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Froycoding8%2Fllama-gradio-ui","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froycoding8%2Fllama-gradio-ui/lists"}