{"id":47885650,"url":"https://github.com/nonatofabio/luna-agent","last_synced_at":"2026-04-04T02:15:42.490Z","repository":{"id":342894335,"uuid":"1172269399","full_name":"nonatofabio/luna-agent","owner":"nonatofabio","description":"Custom minimal AI agent with persistent memory, MCP tools, and Discord","archived":false,"fork":false,"pushed_at":"2026-03-07T21:26:46.000Z","size":61,"stargazers_count":7,"open_issues_count":5,"forks_count":5,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-08T02:51:51.325Z","etag":null,"topics":["agent-framework","ai-agent","discord-bot","homelab","llama-cpp","llm","local-llm","mcp","openai-compatible","python","sqlite","vector-search"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nonatofabio.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-04T05:44:22.000Z","updated_at":"2026-03-08T01:01:32.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/nonatofabio/luna-agent","commit_stats":null,"previous_names":["nonatofabio/luna-agent"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/nonatofabio/luna-agent","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nonatofabio%2Fluna-agent","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nonatofabio%2Fluna-agent/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nonatofabio%2Fluna-agent/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nonatofabio%2Fluna-agent/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nonatofabio","download_url":"https://codeload.github.com/nonatofabio/luna-agent/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nonatofabio%2Fluna-agent/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31384924,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-04T01:22:39.193Z","status":"online","status_checked_at":"2026-04-04T02:00:07.569Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent-framework","ai-agent","discord-bot","homelab","llama-cpp","llm","local-llm","mcp","openai-compatible","python","sqlite","vector-search"],"created_at":"2026-04-04T02:15:39.361Z","updated_at":"2026-04-04T02:15:42.475Z","avatar_url":"https://github.com/nonatofabio.png","language":"Python","readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"assets/logo.svg\" alt=\"Luna Agent\" width=\"120\" height=\"120\"\u003e\n\u003c/p\u003e\n\n\u003ch1 align=\"center\"\u003eLuna Agent\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n  A custom minimal AI agent with persistent memory, MCP tool integration, Discord/CLI interface, and structured observability.\u003cbr\u003e\n  Runs entirely on local hardware — no cloud API costs.\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/nonatofabio/luna-agent/actions/workflows/tests.yml\"\u003e\u003cimg src=\"https://github.com/nonatofabio/luna-agent/actions/workflows/tests.yml/badge.svg\" alt=\"Tests\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://opensource.org/licenses/MIT\"\u003e\u003cimg src=\"https://img.shields.io/badge/License-MIT-yellow.svg\" alt=\"License: MIT\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://www.python.org/downloads/\"\u003e\u003cimg src=\"https://img.shields.io/badge/python-3.11+-blue.svg\" alt=\"Python 3.11+\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n---\n\n~2300 lines of Python. No frameworks.\n\n## Why Custom\n\nWe evaluated existing agent frameworks and rejected them all:\n\n- **OpenClaw**: 400K lines of code, 42K exposed instances on Shodan. Too large to audit, too large to trust.\n- **ZeroClaw**: 9 days old at time of evaluation. Too immature.\n- **NanoClaw**: Too thin — would need to rebuild most of it anyway.\n\nThe core needs (memory, tools, chat, logging) are individually well-solved problems. No 400K-line framework needed.\n\n## Architecture\n\n```\nDiscord (discord.py)     CLI REPL (no token)\n     |                        |\n     v                        v\n+---------------------------------+\n|        Luna Agent Core          |\n|                                 |\n|  agent.py                       |  agent loop: msg → memory → prompt → LLM → tools → respond\n|    ├── llm.py                   |  single LLM client, configurable endpoint\n|    ├── memory.py                |  SQLite + FTS5 + sqlite-vec hybrid search\n|    ├── tools.py                 |  native tools: bash, files, web, delegate, code_task\n|    ├── tool_output.py           |  smart output pipeline for large results\n|    ├── mcp_manager.py           |  MCP client for community tool servers\n|    └── observe.py               |  structured JSON logging\n|                                 |\n+---------------------------------+\n              |\n              v\n        llama-server               Qwen3.5-35B-A3B on 2x RTX 3090\n```\n\nAll LLM traffic flows through a single `LLMClient` with a configurable endpoint URL. Today it points at `localhost:8001` (llama-server). To insert an AI firewall later, change the URL to `localhost:9000` — zero code changes required.\n\n**Thinking model support:** Luna handles reasoning models (Qwen3.5, etc.) automatically — extracting `reasoning_content`, falling back to cleaned reasoning when content is empty, and stripping leaked markup (`\u003cthinking\u003e`, `\u003ctool_call\u003e`, etc.) from output.\n\n## Hardware\n\n- Intel i7-13700K, 64GB DDR4\n- 2x NVIDIA RTX 3090 (24GB each, 48GB total)\n- Qwen3.5-35B-A3B Q8_0 via llama-server with layer split across both GPUs\n- 131K context window, Q8_0 KV cache\n\n## Quick Start\n\n```bash\ncd ~/luna-agent\npython3 -m venv .venv\nsource .venv/bin/activate\npip install -e \".[dev]\"\n\n# Run tests\npytest tests/ -v\n\n# Run without Discord (interactive CLI REPL)\npython -m luna\n\n# Run with Discord\nDISCORD_TOKEN=your-token-here python -m luna\n```\n\n## Project Structure\n\n```\nluna-agent/\n├── config.toml              # All configuration\n├── mcp_servers.json         # MCP server registry\n├── pyproject.toml           # Dependencies\n├── luna/\n│   ├── __main__.py          # Entry point (python -m luna)\n│   ├── agent.py             # Core agent loop\n│   ├── llm.py               # LLM client (OpenAI-compatible)\n│   ├── memory.py            # Memory (SQLite + FTS5 + sqlite-vec)\n│   ├── tools.py             # Native tools (bash, files, web)\n│   ├── tool_output.py       # Large output persistence + filtering\n│   ├── mcp_manager.py       # MCP tool client\n│   ├── discord_bot.py       # Discord interface\n│   ├── observe.py           # Structured JSON logging\n│   └── config.py            # Config loader\n├── tests/\n│   ├── test_agent.py        # Agent loop tests\n│   ├── test_llm.py          # LLM client tests\n│   ├── test_memory.py       # Memory system tests\n│   ├── test_tools.py        # Native tool tests\n│   └── test_tool_output.py  # Output pipeline tests\n├── luna-agent.service        # systemd unit for the agent\n├── worker-agent.service      # systemd unit for llama-server (Qwen3.5-35B-A3B)\n└── data/                     # Created at runtime\n    ├── memory.db             # SQLite database\n    ├── logs/                 # JSON log files\n    │   └── luna-YYYY-MM-DD.jsonl\n    └── tool_outputs/         # Persisted large tool outputs\n```\n\n## Configuration\n\nAll settings live in `config.toml`. Environment variables override for secrets:\n\n| Env Var | Overrides | Required |\n|---------|-----------|----------|\n| `DISCORD_TOKEN` | Discord bot token | Yes (for Discord) |\n| `LLM_ENDPOINT` | `[llm] endpoint` | No |\n| `LLM_MODEL` | `[llm] model` | No |\n| `MEMORY_DB_PATH` | `[memory] db_path` | No |\n| `LOG_DIR` | `[observe] log_dir` | No |\n\nSee `config.toml` for all available settings and their defaults.\n\n## Components\n\n### Agent (`agent.py`)\n\nThe orchestrator. Receives a message and session ID, then:\n\n1. Saves the user message to memory\n2. Searches for relevant memories (hybrid FTS + vector)\n3. Retrieves the session summary (if any)\n4. Builds a system prompt with memories, summary, and current time\n5. Loads the last 20 messages for context\n6. Calls the LLM with all available tools (native + MCP)\n7. Enters a tool call loop (max 25 rounds):\n   - Executes each tool call (native or MCP)\n   - Feeds results back to the LLM\n   - Repeats until the LLM responds without tool calls\n8. Saves the assistant response\n9. Triggers conversation summarization if enough messages have accumulated\n\n### LLM Client (`llm.py`)\n\nThin async wrapper around the OpenAI-compatible API. Single `chat()` method that handles tool calls, thinking model output, and per-call temperature overrides. This is the only code that talks to the LLM — the AI firewall insertion point.\n\nReturns structured `LLMResponse` objects with content, reasoning, tool calls, and token usage.\n\n### Memory (`memory.py`)\n\nSQLite-based persistent memory with three search strategies combined via Reciprocal Rank Fusion:\n\n1. **FTS5 keyword search** — fast exact/stemmed term matching (Porter stemmer + Unicode61 tokenizer)\n2. **sqlite-vec cosine similarity** — semantic search via nomic-embed-text-v1.5 embeddings\n3. **Recency + importance weighting** — recent and important memories rank higher\n\n**Scoring formula:**\n```\nfinal_score = rrf_score + (recency_weight × 2^(-age_days / 7)) + (importance / 10 × 0.1)\n```\n\n**Database tables:**\n\n| Table | Purpose |\n|-------|---------|\n| `messages` | Every message persisted per session |\n| `memories` | Extracted facts with embeddings and importance scores |\n| `summaries` | LLM-generated compression of old message blocks |\n| `memories_fts` | FTS5 virtual table for keyword search |\n| `memories_vec` | sqlite-vec virtual table for vector search |\n\n**Conversation compression:** Every N messages (default 50), the LLM summarizes the conversation and extracts facts with importance scores (1-10). Facts above the threshold (default 3.0) are stored as memories. This enables effectively infinite conversations — the agent always has a summary of what came before plus searchable memory of key facts.\n\nAll retrieval parameters (top_k, RRF k, recency weight, importance threshold, etc.) are in `config.toml` for experimentation.\n\n### Native Tools (`tools.py`)\n\nBuilt-in tools that don't require external MCP servers:\n\n| Tool | Description |\n|------|-------------|\n| `bash` | Execute shell commands with safety guardrails |\n| `read_file` | Read files with optional offset/limit for large files |\n| `write_file` | Write or append to files, creates parent directories |\n| `list_directory` | List files/directories, optional recursion with depth limits |\n| `web_fetch` | Fetch a URL and convert HTML to markdown via html2text |\n| `web_search` | Search the web via DuckDuckGo, returns structured results |\n| `delegate` | Hand off a self-contained subtask to a sub-agent with its own tool loop |\n| `code_task` | Delegate a coding task to a sub-agent with a write-run-fix loop |\n| `summarize_paper` | Fetch and summarize an arXiv paper |\n| `list_available_tools` | Discover MCP tools available from connected servers |\n| `use_tool` | Call a specific MCP tool by name |\n\n**Bash safety:** Commands are checked against blocked patterns before execution:\n- `rm -rf /`, `mkfs`, `dd if=`, `shutdown`, `reboot`, fork bombs, writes to `/dev/sda`\n- Timeout enforcement: default 30s, max 120s\n- Output capped at 50KB\n\n### Tool Output Pipeline (`tool_output.py`)\n\nHandles large tool outputs so they don't overwhelm the LLM context:\n\n1. **Small outputs** (\u003c 10KB) — passed through directly\n2. **Large outputs** — processed through a pipeline:\n   - **Persist** — full output saved to `data/tool_outputs/` with a deterministic filename (content hash + source label)\n   - **Python filter** — keyword matching against the user's query context, with structural detection (headers, code blocks). Includes 1 line of surrounding context per match.\n   - **LLM extraction** — if the Python filter finds fewer than 5 keyword matches, the LLM extracts relevant parts from the raw output\n   - **File reference** — a footer with the persisted file path is appended so the agent can inspect the full output later\n\n### MCP Manager (`mcp_manager.py`)\n\nConnects to community MCP servers via stdio transport. On startup it spawns configured servers, discovers their tools, and converts schemas to OpenAI function-calling format. Tool calls from the LLM are routed to the correct server automatically.\n\n**Tool namespacing:** Tools are prefixed with the server name (`browser__navigate`, `filesystem__read_file`) to avoid collisions between servers.\n\nConfigure servers in `mcp_servers.json`:\n\n```json\n{\n  \"servers\": {\n    \"browser\": {\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"@playwright/mcp\"],\n      \"transport\": \"stdio\"\n    }\n  }\n}\n```\n\nAdding a new tool is editing JSON — no code changes.\n\n### Discord Bot (`discord_bot.py`)\n\nResponds to DMs, @mentions, and replies in threads it created. Shows a typing indicator while the agent is processing.\n\n**Session isolation:** Session IDs are derived from message context to keep memory separate:\n\n| Context | Session ID |\n|---------|-----------|\n| Thread | `thread-{thread_id}` |\n| DM | `dm-{user_id}` |\n| Channel | `ch-{channel_id}-{user_id}` |\n\nLong responses are split at newlines (preferred), spaces, or hard-split at 2000 characters to stay within Discord's limit.\n\n### Observability (`observe.py`)\n\nEvery LLM call, tool execution, memory operation, and Discord message is logged as structured JSON.\n\n**Dual output:**\n- **File** — `data/logs/luna-YYYY-MM-DD.jsonl`, one file per day, machine-parseable\n- **Console** — human-readable format for development\n\n**What's logged:**\n\n| Component | Events |\n|-----------|--------|\n| LLM | `llm_call`, `llm_response` (tokens, latency, tools used) |\n| Memory | `memory_search` (hits, method breakdown), `memory_stored`, `summary_stored` |\n| Tools | `tool_executing`, `native_tool_call`, `tool_call` (server, tool, duration, errors) |\n| Discord | `discord_ready`, `discord_message` (session, author, channel) |\n| MCP | `server_connected`, `tools_refreshed`, `mcp_shutdown` |\n| Agent | `agent_process` (latency), `agent_response` (memory hits, tool rounds) |\n| Output | `output_persisted`, `llm_extraction_triggered` |\n\n**Inspection:**\n\n```bash\n# Watch logs in real-time\ntail -f data/logs/luna-*.jsonl\n\n# Search with jq\njq 'select(.event == \"llm_response\")' data/logs/luna-*.jsonl\njq 'select(.latency_ms \u003e 5000)' data/logs/luna-*.jsonl\n```\n\n### Config (`config.py`)\n\nDataclass-based configuration loaded from `config.toml` with environment variable overrides. Relative paths are resolved against the project root. All fields have sensible defaults — the agent starts with zero configuration if a `config.toml` is present.\n\n## Deployment\n\nCopy the systemd service files and enable them:\n\n```bash\nsudo cp luna-agent.service /etc/systemd/system/\nsudo cp worker-agent.service /etc/systemd/system/\n\nsudo systemctl daemon-reload\nsudo systemctl enable --now worker-agent    # Start LLM server (Qwen3.5-35B-A3B) first\nsudo systemctl enable --now luna-agent      # Then the agent (depends on worker-agent)\n```\n\n**Monitor:**\n\n```bash\njournalctl -u luna-agent -f\njournalctl -u worker-agent -f\n```\n\n**CLI mode** (no Discord token): The agent starts an interactive REPL where tool calls print inline as they execute, then the final response prints below. Useful for testing without Discord.\n\n## Dependencies\n\n8 runtime packages, no heavy frameworks:\n\n| Package | Purpose |\n|---------|---------|\n| `discord.py` | Discord API client |\n| `openai` | OpenAI-compatible HTTP client |\n| `mcp[cli]` | Model Context Protocol SDK |\n| `sentence-transformers` | Embedding model runtime |\n| `einops` | Tensor operations for embeddings |\n| `sqlite-vec` | Vector search in SQLite |\n| `html2text` | HTML to markdown conversion |\n| `duckduckgo-search` | Web search |\n\n**Dev:** `pytest`, `pytest-asyncio`\n\n**Python:** \u003e= 3.11\n\n## What This Doesn't Include (By Design)\n\n- No AI firewall (future — just don't block the insertion point)\n- No web dashboard (future phase of observability)\n- No multi-user auth (single user)\n- No cloud LLM fallback (local only)\n- No containers for the agent (systemd is simpler)\n\n## Contributing\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for development setup, testing, and pull request guidelines.\n\n## License\n\n[MIT](LICENSE) — Fabio Nonato, 2026\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnonatofabio%2Fluna-agent","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnonatofabio%2Fluna-agent","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnonatofabio%2Fluna-agent/lists"}