{"id":33374369,"url":"https://github.com/sshoecraft/shepherd","last_synced_at":"2026-04-10T07:10:26.694Z","repository":{"id":319033187,"uuid":"1057510455","full_name":"sshoecraft/shepherd","owner":"sshoecraft","description":"An interactive multi-backend LLM runtime with intelligent cache eviction and persistent retrieval-augmented memory.","archived":false,"fork":false,"pushed_at":"2025-11-13T05:16:37.000Z","size":1169,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-11-13T07:10:23.430Z","etag":null,"topics":["anthropic","cli","cpp","cuda","gemini","grok","inference","kv-cache","llama-cpp","llm","mcp","ollama","openai","openai-server","rag","smart-evictions","tensorrt","tool-calling","ulimited-context"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sshoecraft.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-15T20:36:29.000Z","updated_at":"2025-11-13T05:16:41.000Z","dependencies_parsed_at":null,"dependency_job_id":"107dbda0-7adc-4209-9c9b-92956be439a1","html_url":"https://github.com/sshoecraft/shepherd","commit_stats":null,"previous_names":["sshoecraft/shepherd"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/sshoecraft/shepherd","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sshoecraft%2Fshepherd","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sshoecraft%2Fshepherd/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sshoecraft%2Fshepherd/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sshoecraft%2Fshepherd/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sshoecraft","download_url":"https://codeload.github.com/sshoecraft/shepherd/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sshoecraft%2Fshepherd/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":285873538,"owners_count":27246054,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-11-22T02:00:05.934Z","response_time":64,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anthropic","cli","cpp","cuda","gemini","grok","inference","kv-cache","llama-cpp","llm","mcp","ollama","openai","openai-server","rag","smart-evictions","tensorrt","tool-calling","ulimited-context"],"created_at":"2025-11-22T23:01:02.040Z","updated_at":"2026-04-10T07:10:26.687Z","avatar_url":"https://github.com/sshoecraft.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Shepherd\n\n**Multi-Backend LLM Inference Server and Interactive Agent**\n\nShepherd is a C++ LLM system supporting local models (llama.cpp, TensorRT-LLM) and cloud APIs (OpenAI, Anthropic, Gemini, Grok, Ollama). It provides multiple frontends (CLI, TUI, OpenAI-compatible API server, CLI server, JSON line-protocol), automatic KV cache eviction for indefinite conversations, retrieval-augmented generation (RAG) with SQLite or PostgreSQL, tool/function calling, MCP integration, multi-model collaboration, and background memory extraction.\n\n---\n\n## Quick Start\n\n```bash\ngit clone --recursive https://github.com/sshoecraft/shepherd.git\ncd shepherd\nmake\n```\n\nThe Makefile checks for missing dependencies and tells you exactly what to install.\n\n**Add a cloud provider and start using it:**\n```bash\n# OpenAI\nshepherd provider add gpt openai --model gpt-4o --api-key sk-...\n\n# Anthropic\nshepherd provider add sonnet anthropic --model claude-sonnet-4 --api-key sk-ant-...\n\n# Start chatting\n./shepherd --provider sonnet\n```\n\n**Local model (requires NVIDIA GPU + CUDA):**\n```bash\n./shepherd -m /path/to/model.gguf\n```\n\nSee [BUILD.md](BUILD.md) for GPU setup and advanced build options.\n\n---\n\n## Usage\n\n### Interactive Mode\n\n```bash\n$ ./shepherd --provider mylocal\n\nShepherd v2.39.5\nProvider: mylocal (llamacpp)\nModel: qwen3-30b-a3b\nContext: 40960 tokens\nTools: 50+ available\n\n\u003e What files are in the current directory?\n* list_directory(path=\".\")\n\nThe current directory contains:\n- main.cpp: Application entry point\n- README.md: Project documentation\n- Makefile: Build configuration\n...\n\n\u003e /provider use sonnet\nSwitched to provider: sonnet\n\n\u003e Explain this code\n...\n```\n\n### Providers\n\nProviders define backends you can switch between at runtime.\n\n```bash\n# List configured providers\nshepherd provider list\n\n# Add providers (action-first for creating new)\nshepherd provider add local --type llamacpp --model /models/qwen-72b.gguf\nshepherd provider add sonnet --type anthropic --model claude-sonnet-4 --api-key sk-ant-...\nshepherd provider add gpt --type openai --model gpt-4o --api-key sk-...\n\n# View/modify providers (name-first pattern)\nshepherd provider sonnet show        # Show details\nshepherd provider sonnet set model claude-sonnet-4-20250514  # Modify setting\nshepherd provider sonnet use         # Switch to this provider\nshepherd provider sonnet             # Show help for this provider\n\n# In interactive mode\n\u003e /provider local use\n\u003e /provider next\n```\n\n### Tools\n\nShepherd includes built-in tools across several categories:\n\n| Category | Tools |\n|----------|-------|\n| **Core** | bash, glob, grep, edit, web_fetch, web_search, todo_write, task, get_time, get_date |\n| **Filesystem** | read, write, list_directory, delete_file, file_exists |\n| **Command** | execute_command, get_environment_variable, list_processes |\n| **HTTP** | http_get, http_post, http_put, http_delete |\n| **JSON** | json_parse, json_serialize, json_extract |\n| **Memory** | search_memory, set_fact, get_fact, clear_fact, store_memory, clear_memory |\n| **Scheduler** | list_schedules, add_schedule, remove_schedule, enable_schedule, disable_schedule, get_schedule |\n| **MCP** | list_mcp_resources, read_mcp_resource, plus dynamic `server:tool` from configured MCP servers |\n| **API Provider** | `ask_\u003cprovider\u003e` tools for cross-model consultation (auto-generated from configured providers) |\n| **Remote** | Remote tool proxy for distributed tool execution |\n\n```bash\n# List tools\nshepherd tools list\n\n# Enable/disable specific tools\nshepherd tools enable shell\nshepherd tools disable shell\n\n# Disable all tools\n./shepherd --notools\n```\n\n---\n\n## Configuration\n\n### Config File\n\nConfiguration is stored at `~/.config/shepherd/config.json` (XDG-compliant):\n\n```json\n{\n    \"streaming\": true,\n    \"thinking\": false,\n    \"reasoning\": \"off\",\n    \"tui\": true,\n    \"stats\": false,\n    \"auto_provider\": false,\n    \"warmup\": true,\n    \"max_tokens\": 0,\n    \"memory_database\": \"~/.local/share/shepherd/memory.db\",\n    \"max_db_size\": \"10G\",\n    \"rag_context_injection\": true,\n    \"rag_relevance_threshold\": 0.3,\n    \"rag_max_results\": 5,\n    \"memory_extraction\": false,\n    \"memory_extraction_model\": \"\",\n    \"memory_extraction_endpoint\": \"\",\n    \"user_id\": \"\",\n    \"web_search_provider\": \"\",\n    \"auth_mode\": \"none\",\n    \"server_tools\": false\n}\n```\n\n```bash\n# View configuration\nshepherd config show\n\n# Set values (key-first shortcut)\nshepherd config streaming true       # Set streaming to true\nshepherd config max_db_size 20G      # Set max_db_size\n\n# Or use explicit set\nshepherd config set streaming true\n\n# View single value\nshepherd config streaming            # Shows current streaming value\n\n# In interactive mode\n\u003e /config show\n\u003e /config streaming true\n```\n\n### MCP Servers\n\nConfigure [Model Context Protocol](https://modelcontextprotocol.io/) servers for external tool integration:\n\n```bash\n# List MCP servers\nshepherd mcp list\n\n# Add an MCP server (action-first for creating new)\nshepherd mcp add mydb python /path/to/mcp_server.py -e DB_HOST=localhost\n\n# View/modify servers (name-first pattern)\nshepherd mcp mydb show               # Show server details\nshepherd mcp mydb test               # Test connection\nshepherd mcp mydb remove             # Remove server\nshepherd mcp mydb                    # Show help for this server\n```\n\n### SMCP Servers (Secure Credentials)\n\nSMCP passes credentials to MCP servers via stdin, never in environment variables or CLI args:\n\n```bash\n# Add SMCP server with credentials\nshepherd smcp add database smcp-postgres --cred DB_URL=postgresql://user:pass@host/db\n```\n\nCredentials are sent via the [SMCP protocol](https://github.com/sshoecraft/smcp) handshake, never exposed in `/proc`, `ps`, or config files.\n\n### External Configuration Sources\n\n**Azure Key Vault** -- Load configuration from Azure Key Vault using Managed Identity:\n\n```bash\n./shepherd --config msi --kv my-vault-name\n```\n\nStore a secret named `shepherd-config` containing the unified JSON config. The VM's managed identity needs \"Key Vault Secrets User\" role.\n\n**HashiCorp Vault** -- Load configuration from HashiCorp Vault using a pre-injected token:\n\n```bash\n./shepherd --config vault --kv https://vault.example.com\n```\n\nReads the token from `/vault/secrets/token` (Vault Agent Injector) and fetches config from `shepherd/config` (KV v2).\n\nConfiguration loaded from either vault is **read-only** -- settings cannot be modified at runtime.\n\n### Environment Variables\n\n```bash\nSHEPHERD_INTERACTIVE=1    # Force interactive mode (useful in scripts/pipes)\nNO_COLOR=1                # Disable colored output\n```\n\n---\n\n## Server Modes\n\nShepherd can run as a server for remote access or persistent sessions.\n\n### API Server (OpenAI-Compatible)\n\nExposes an OpenAI-compatible REST API for remote access to your local Shepherd instance.\n\n```bash\n./shepherd --server --port 8000\n```\n\n**Use cases:**\n- Access your home server's GPU from your laptop\n- Use OpenAI-compatible tools with local models\n- Integration with any OpenAI client library\n\n**Endpoints:**\n- `POST /v1/chat/completions` - Chat completions (streaming supported)\n- `GET /v1/models` - List available models (includes provider info, version, capabilities)\n- `GET /health` - Health check\n\n**Authentication:**\n\nGenerate API keys for clients to authenticate against the server (OpenAI-compatible `Authorization: Bearer` header). See [docs/api_server.md](docs/api_server.md) for details.\n\n```bash\n./shepherd --server --auth-mode json\nshepherd apikey create mykey    # Generates sk-shep-...\n```\n\n**Server-side tool execution:**\n\n```bash\n# Tools execute on the server, results returned to client\n./shepherd --server --use-tools\n\n# Expose /v1/tools endpoint for tool discovery\n./shepherd --server --server-tools\n\n# Control whether tool call details are streamed to clients\n./shepherd --server --use-tools --show-tool-calls true\n```\n\nFor full documentation, see [docs/api_server.md](docs/api_server.md).\n\n### CLI Server (Persistent Session)\n\nRuns a persistent AI session with server-side tool execution and multi-client access.\n\n```bash\n./shepherd --cliserver --port 8000\n```\n\n**Use cases:**\n- 24/7 AI assistant with full tool access\n- Query databases without exposing credentials to clients\n- Multiple clients see the same session via SSE streaming\n\n**Connect a client:**\n```bash\n./shepherd --backend cli --api-base http://server:8000\n```\n\nFor full documentation, see [docs/cli_server.md](docs/cli_server.md).\n\n### JSON Frontend (Machine Integration)\n\nA JSON line-protocol frontend for pipe-based machine-to-machine communication (chatroom adapters, orchestration scripts, test harnesses).\n\n```bash\n# Interactive pipe mode\nshepherd --json -p provider_name\n\n# Single query mode\nshepherd --json --prompt \"hello\" -p provider_name\n\n# Piped input\necho '{\"type\":\"user\",\"content\":\"hello\"}' | shepherd --json -p provider_name\n```\n\n**Input** (stdin, one JSON object per line):\n```json\n{\"type\": \"user\", \"content\": \"your message here\"}\n```\n\n**Output** (stdout, one JSON object per line):\n\n| Type | Description | Fields |\n|------|-------------|--------|\n| `text` | Assistant response chunk | `content` |\n| `thinking` | Reasoning/thinking chunk | `content` |\n| `tool_use` | Tool call initiated | `name`, `params`, `id` |\n| `tool_result` | Tool execution result | `name`, `id`, `success`, `summary` |\n| `end_turn` | Turn complete | `turns`, `total_tokens`, `cost_usd` |\n| `error` | Error occurred | `message` |\n| `system` | System message | `content` |\n\nTools execute locally (same as CLI/TUI). No threads, no terminal handling, no HTTP -- the simplest frontend.\n\n### Server Composability\n\nShepherd's architecture allows **any backend** with **any frontend**, and servers can be chained together.\n\n**Key principle**: With API backends, each incoming connection creates a new backend connection - no session contention, fully scalable.\n\n#### Example: API Proxy with Credential Isolation\n\nHide your Azure OpenAI credentials while adding tools and your own API keys:\n\n```bash\n# Shepherd connects to Azure OpenAI (credentials stay on server)\n# Clients connect to Shepherd with your API keys\n./shepherd --backend openai \\\n           --api-base https://mycompany.openai.azure.com/v1 \\\n           --api-key $AZURE_KEY \\\n           --server --port 8000 --auth-mode json --server-tools\n\n# Generate keys for your clients\nshepherd apikey create client1\nshepherd apikey create client2\n```\n\nClients get:\n- Access to Azure OpenAI without knowing the Azure credentials\n- Server-side tools (filesystem, shell, MCP servers)\n- Your access control via Shepherd API keys\n\n#### Example: Persistent Session on vLLM\n\nUse vLLM's multi-user capabilities with a persistent CLI session:\n\n```bash\n# vLLM server running on port 5000 (handles multiple users efficiently)\n# Shepherd CLI server on top for persistent session + tools\n./shepherd --backend openai \\\n           --api-base http://localhost:5000/v1 \\\n           --cliserver --port 8000\n```\n\nNow you have:\n- vLLM's PagedAttention for efficient multi-conversation handling\n- Shepherd's persistent session (all clients see same conversation)\n- Server-side tools executing locally\n\n#### Example: Multi-Level Chaining\n\n```bash\n# Level 1: llamacpp backend\n./shepherd --backend llamacpp -m /models/qwen-72b.gguf --server --port 5000\n\n# Level 2: API server proxy (adds tools + API keys)\n./shepherd --backend openai --api-base http://localhost:5000/v1 \\\n           --server --port 6000 --auth-mode json --server-tools\n\n# Level 3: CLI server for persistent session\n./shepherd --backend openai --api-base http://localhost:6000/v1 \\\n           --cliserver --port 7000\n```\n\n---\n\n## Features\n\n### Multi-Backend Architecture\n\n| Backend | Type | Description | Tools |\n|---------|------|-------------|-------|\n| **llama.cpp** | Local | Llama, Qwen, Mistral, Gemma, and other GGUF models | Yes |\n| **TensorRT-LLM** | Local | NVIDIA-optimized inference for supported models | Yes |\n| **OpenAI** | Cloud | GPT-4o, GPT-4 Turbo, o1, o3 (also Azure OpenAI deployments) | Yes |\n| **Anthropic** | Cloud | Claude Opus, Sonnet, Haiku | Yes |\n| **Gemini** | Cloud | Gemini 2.5 Pro/Flash | Yes |\n| **Grok** | Cloud | xAI Grok models (OpenAI-compatible protocol) | Yes |\n| **Ollama** | Local/Cloud | Any model available in Ollama | Yes |\n| **CLI Client** | Remote | Connects to a remote Shepherd CLI server | Yes |\n\n### RAG System\n\nEvicted messages are automatically archived to a database with full-text search. Supports two backends:\n\n- **SQLite** (default): FTS5 full-text search with BM25 ranking + time-based recency scoring\n- **PostgreSQL** (optional): `tsvector`/`tsquery` with GIN index + recency scoring\n\n```bash\n# SQLite (default)\nshepherd config memory_database ~/.local/share/shepherd/memory.db\n\n# PostgreSQL\nshepherd config memory_database postgresql://user:pass@host/shepherd\n```\n\n```bash\n\u003e Remember that the project deadline is March 15\n* set_fact(key=\"project_deadline\", value=\"March 15\")\n\n# Later, or in a new session...\n\u003e What's the project deadline?\n* get_fact(key=\"project_deadline\")\n\nThe project deadline is March 15.\n```\n\nSearch archived conversations:\n```bash\n\u003e Search my memory for discussions about authentication\n* search_memory(query=\"authentication\")\n```\n\n**Memory extraction**: An optional background thread that automatically extracts facts and context summaries from conversations using a separate LLM API call. Enable with `memory_extraction: true` in config and configure `memory_extraction_endpoint` and `memory_extraction_api_key`.\n\n**Multi-tenant isolation**: All RAG operations are scoped by `user_id`. Set a global `user_id` in config to share memory across platforms, or leave empty for automatic per-client isolation.\n\n### Multi-Model Collaboration\n\nWhen multiple providers are configured, Shepherd creates `ask_*` tools for cross-model consultation:\n\n```bash\n# Using local model, ask Claude for code review\n\u003e ask_sonnet to read main.cpp and suggest improvements\n\n* ask_sonnet(prompt=\"read main.cpp and suggest improvements\")\n  → Sonnet calls read(path=\"main.cpp\")\n  → Sonnet analyzes and responds\n\nClaude's analysis appears in your local model's context.\n```\n\n**Key feature**: The `ask_*` tools have full tool access - the consulted model can read files, run commands, search memory, etc. You can chain consultations: ask Sonnet to ask GPT to analyze something.\n\nThe current provider is excluded (you don't ask yourself). Switch providers and the tools update automatically.\n\n### Automatic Session Eviction\n\nShepherd supports automatic eviction for indefinite conversations with **any backend**:\n\n- **Local backends**: Evicts when GPU KV cache fills\n- **API backends**: Evicts when API returns context full error, then retries\n- **Manual limit**: Use `--context-size N` to set a limit smaller than the backend's maximum\n\n```bash\n# Force eviction at 32K tokens even if backend supports more\n./shepherd --provider azure --context-size 32768\n```\n\n**Eviction behavior**:\n- Oldest messages first (LRU), protecting system prompt and current context\n- Automatic archival to RAG database before eviction\n- Seamless continuation - conversation keeps going\n\nFor local backend implementation details, see [docs/llamacpp.md](docs/llamacpp.md).\n\n### Scheduling\n\nShepherd includes a cron-like scheduler that injects prompts into the session automatically. Works with CLI, TUI, and CLI server modes.\n\n```bash\n# Add a scheduled task (action-first for creating new)\nshepherd sched add morning-news \"0 9 * * *\" \"Get me the top 5 tech news headlines\"\n\n# List scheduled tasks\nshepherd sched list\n\n# View/modify schedules (name-first pattern)\nshepherd sched morning-news show     # Show schedule details\nshepherd sched morning-news disable  # Disable schedule\nshepherd sched morning-news enable   # Enable schedule\nshepherd sched morning-news remove   # Remove schedule\n```\n\n**24/7 Operation**: Run a CLI server and schedules execute automatically, even with no clients connected:\n\n```bash\n./shepherd --cliserver --port 8000\n\n# Scheduled prompts run in the session:\n# - \"Check server disk usage\" every hour\n# - \"Summarize overnight logs\" at 6am\n# - \"Generate daily report\" at 5pm\n```\n\nClients connect to see results from scheduled tasks in the conversation history.\n\n---\n\n## Command Reference\n\n### Subcommands\n\n| Command | Description |\n|---------|-------------|\n| `shepherd provider \u003cadd\\|list\\|show\\|remove\\|use\u003e` | Manage providers |\n| `shepherd config \u003cshow\\|set\\|KEY\\|KEY VALUE\u003e` | View/modify configuration |\n| `shepherd tools \u003clist\\|enable\\|disable\u003e` | Manage tools |\n| `shepherd mcp \u003cadd\\|remove\\|list\\|NAME show\\|NAME test\u003e` | Manage MCP servers |\n| `shepherd smcp \u003cadd\\|remove\\|list\u003e` | Manage SMCP servers (secure credentials) |\n| `shepherd sched \u003clist\\|add\\|remove\\|enable\\|disable\\|show\\|next\u003e` | Scheduled tasks |\n| `shepherd apikey \u003ccreate\\|list\\|remove\u003e` | API key management |\n| `shepherd edit-system` | Edit system prompt in $EDITOR |\n\n### Common Flags\n\n| Flag | Description |\n|------|-------------|\n| `-p, --provider NAME` | Use specific provider |\n| `-m, --model PATH` | Model name or file |\n| `--backend TYPE` | Backend: llamacpp, tensorrt, openai, anthropic, gemini, grok, ollama, cli |\n| `--context-size N` | Context window size (0 = auto) |\n| `--max-tokens N` | Max generation tokens (-1 = max, 0 = auto, \u003e0 = explicit) |\n| `--prompt, -e TEXT` | Single query mode (run one query and exit) |\n| `--server` | Start API server mode |\n| `--cliserver` | Start CLI server mode |\n| `--json` | JSON line-protocol frontend for machine integration |\n| `--port N` | Server port (default: 8000) |\n| `--use-tools` | Execute tools server-side in API server |\n| `--server-tools` | Expose /v1/tools endpoints for tool discovery/execution |\n| `--show-tool-calls BOOL` | Control streaming of tool call/result text to clients |\n| `--notools` | Disable all tools |\n| `--enable-tools PATTERN` | Enable tools matching glob pattern |\n| `--disable-tools PATTERN` | Disable tools matching glob pattern |\n| `--memtools` | Enable memory tools |\n| `--nostream` | Disable streaming output |\n| `--reasoning LEVEL` | Extended thinking: off, low, medium, high |\n| `--stats` | Show performance stats (prefill/decode speed, KV cache) |\n| `--tui` / `--no-tui` | Enable/disable TUI mode |\n| `--nomcp` | Disable MCP server loading |\n| `--norag` | Disable RAG entirely |\n| `--nomemory` | Disable memory injection and extraction |\n| `--nosched` | Disable scheduler |\n| `--warmup` | Send warmup message before first prompt |\n| `--flash-attn` | Enable flash attention (llama.cpp) |\n| `--system-prompt TEXT` | Override system prompt |\n| `--config msi --kv VAULT` | Load config from Azure Key Vault |\n| `--config vault --kv ADDR` | Load config from HashiCorp Vault |\n\n**Sampling overrides**: `--temperature`, `--top-p`, `--top-k`, `--freq` (frequency penalty), `--repeat-penalty`\n\nRun `shepherd --help` for the complete list.\n\n---\n\n## Hardware Requirements\n\n### Minimum\n- **GPU**: NVIDIA GTX 1080 Ti (11GB VRAM) or better\n- **RAM**: 32GB system RAM\n- **Storage**: SATA SSD (500GB)\n\n### Recommended\n- **GPU**: 2x NVIDIA RTX 3090 (48GB VRAM)\n- **RAM**: 128GB system RAM\n- **Storage**: NVMe SSD (1TB+)\n\n### Cloud\n- **AWS**: g5.12xlarge (4x A10G)\n- **GCP**: a2-highgpu-4g (4x A100)\n- **Azure**: Standard_NC24ads_A100_v4\n\n---\n\n## Performance\n\nBenchmarked with `scripts/openai_bench.py` (5 runs, 2048 max tokens, streaming, batch_size=1).\n\n### Shepherd vs Other Servers (gpt-oss-120b Q4_K_M, 4x GPU pp=4, 32K context, f16 KV cache)\n\n| Server | Tokens/sec | TTFT (ms) | ITL (ms) |\n|--------|-----------|-----------|----------|\n| **Shepherd** | **141.30** | 1063 | **7.62** |\n| llama-server (standalone) | 124.52 | 952 | 8.03 |\n| vLLM | 98.18 | 809 | 10.19 |\n\n### Model Quantization Comparison (Shepherd, gpt-oss-120b, 4x GPU pp=4)\n\n| Model / Quant | Tokens/sec | TTFT (ms) | ITL (ms) |\n|---------------|-----------|-----------|----------|\n| Q4_K_M | 141.30 | 1063 | 7.62 |\n| Q4_K_M + flash-attn | 138.27 | 767 | 7.62 |\n| MXFP4 | 128.72 | 794 | 8.21 |\n| MXFP4 + flash-attn | 130.79 | 681 | 8.03 |\n\n### Provider Configuration (Shepherd, gpt-oss-120b-heretic-v2-MXFP4, flash-attn, server-tools)\n\n| Metric | Value |\n|--------|-------|\n| Tokens/sec (mean) | 130.85 |\n| Tokens/sec (stddev) | 0.55 |\n| TTFT (median) | 739 ms |\n| ITL (mean) | 8.00 ms |\n| Avg tokens generated | 2048 |\n\n---\n\n## Troubleshooting\n\n### Out of Memory During Inference\n\nReduce context size:\n```bash\n./shepherd --context-size 65536\n```\n\nOr use a more aggressive quantization (Q4_K_M instead of Q8_0).\n\n### Slow Generation Speed\n\nIncrease GPU layers or switch backends:\n```bash\n./shepherd --gpu-layers 48\n```\n\n### KV Cache Issues\n\nIf you see repetitive or nonsensical output, the KV cache may be corrupted. Restart Shepherd to clear the cache.\n\nFor debug builds, use `-d=3` for verbose KV cache logging.\n\n---\n\n## Development\n\n### Architecture Overview\n\n```\n┌───────────────────────────────────────────────────────────────┐\n│                          Frontend                              │\n│  CLI, TUI, JSON, API Server, CLI Server                       │\n└──────────────────────────┬────────────────────────────────────┘\n                           │\n┌──────────────────────────v────────────────────────────────────┐\n│                     Session + Provider                         │\n│  Message routing, provider switching, tool execution, RAG     │\n└──────────────────────────┬────────────────────────────────────┘\n                           │\n        ┌──────────────────┼──────────────────┐\n        │                  │                  │\n┌───────v────┐      ┌──────v─────┐     ┌─────v─────────┐\n│  LlamaCpp  │      │  TensorRT  │     │ API Backends  │\n│  Backend   │      │  Backend   │     │ (6 types)     │\n└────────────┘      └────────────┘     └───────────────┘\n```\n\nFor detailed architecture, see [docs/architecture.md](docs/architecture.md).\n\n### Project Structure\n\n```\nshepherd/\n├── main.cpp                  # Entry point, argument parsing, subcommand dispatch\n├── frontend.cpp/h            # Frontend abstraction and event callback system\n├── backend.cpp/h             # Backend base class\n├── session.cpp/h             # Session management, eviction logic\n├── session_manager.cpp/h     # Multi-session management\n├── provider.cpp/h            # Provider configuration and switching\n├── config.cpp/h              # Configuration (file, Azure KV, HashiCorp Vault)\n├── rag.cpp/h                 # RAG interface and context injection\n├── server.cpp/h              # HTTP server base class\n├── auth.cpp/h                # API key authentication (JSON file, Azure MSI, PostgreSQL)\n├── scheduler.cpp/h           # Cron-like task scheduler (SIGALRM-based)\n├── memory_extraction.cpp/h   # Background fact extraction from conversations\n├── http_client.cpp/h         # HTTP client with retry logic\n├── generation_thread.cpp/h   # Threaded generation support\n├── azure_msi.cpp/h           # Azure Managed Identity integration\n├── hashicorp_vault.cpp/h     # HashiCorp Vault config loading\n│\n├── backends/\n│   ├── api.cpp/h             # Base class for all API backends\n│   ├── gpu.cpp/h             # Base class for local GPU backends\n│   ├── llamacpp.cpp/h        # llama.cpp backend\n│   ├── tensorrt.cpp/h        # TensorRT-LLM backend\n│   ├── openai.cpp/h          # OpenAI / Azure OpenAI API\n│   ├── anthropic.cpp/h       # Anthropic Claude API\n│   ├── gemini.cpp/h          # Google Gemini API\n│   ├── grok.cpp/h            # xAI Grok API\n│   ├── ollama.cpp/h          # Ollama API\n│   ├── cli_client.cpp/h      # CLI client (connects to remote CLI server)\n│   ├── models.cpp/h          # Model family detection and context size database\n│   ├── chat_template.cpp/h   # Chat template parsing (Jinja2)\n│   ├── harmony.cpp/h         # GPT-OSS / Harmony format parser\n│   └── factory.cpp/h         # Backend factory\n│\n├── frontends/\n│   ├── cli.cpp/h             # Interactive CLI (replxx line editing)\n│   ├── tui.cpp/h             # Full-screen TUI (ncurses)\n│   ├── json_frontend.cpp/h   # JSON line-protocol (machine integration)\n│   ├── api_server.cpp/h      # OpenAI-compatible HTTP API server\n│   └── cli_server.cpp/h      # Persistent CLI session over HTTP + SSE\n│\n├── tools/                    # Tool implementations (core, filesystem, command,\n│                             #   HTTP, JSON, memory, scheduler, MCP, remote, API)\n├── mcp/                      # MCP/SMCP client, server, config, tool adapters\n├── rag/                      # RAG database backends (SQLite, PostgreSQL)\n├── scripts/                  # Benchmark and utility scripts\n├── tests/                    # Unit and integration tests\n├── docs/                     # Architecture and feature documentation\n├── packaging/                # Debian packaging scripts\n├── vendor/                   # Third-party: llama.cpp, replxx, tokenizers\n├── CMakeLists.txt            # Build system\n└── Makefile                  # Build wrapper\n```\n\n### Extending Shepherd\n\n- **Adding backends**: See [docs/backends.md](docs/backends.md)\n- **Adding tools**: See `tools/tool.h` for the tool interface\n- **Architecture**: See [docs/architecture.md](docs/architecture.md) for the full system design\n- **Frontend design**: See [docs/frontend.md](docs/frontend.md)\n\n---\n\n## Contributing\n\nContributions welcome! Areas of interest:\n- Additional backend integrations\n- New tool implementations\n- Performance optimizations\n- Documentation improvements\n\n---\n\n## Testing\n\n```bash\n# Build with tests enabled\necho \"TESTS=ON\" \u003e\u003e ~/.shepherd_opts\nmake\n\n# Run tests\ncd build \u0026\u0026 make test_unit test_tools\n./tests/test_unit\n./tests/test_tools\n```\n\nSee [docs/testing.md](docs/testing.md) for the full test plan.\n\n---\n\n## License\n\n**PolyForm Shield License 1.0.0**\n\n- ✅ Use for any purpose (personal, commercial, internal)\n- ✅ Modify and create derivative works\n- ✅ Distribute copies\n- ❌ Sell Shepherd as a standalone product\n- ❌ Offer Shepherd as a paid service (SaaS)\n- ❌ Create competing products\n\nSee [LICENSE](LICENSE) for full text.\n\n---\n\n## Acknowledgments\n\n- **llama.cpp**: Georgi Gerganov and contributors\n- **TensorRT-LLM**: NVIDIA Corporation\n- **Model Context Protocol**: Anthropic\n- **SQLite**: D. Richard Hipp\n\n---\n\n## Contact\n\n- **Issues**: https://github.com/sshoecraft/shepherd/issues\n- **Discussions**: https://github.com/sshoecraft/shepherd/discussions\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsshoecraft%2Fshepherd","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsshoecraft%2Fshepherd","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsshoecraft%2Fshepherd/lists"}