{"id":47681663,"url":"https://github.com/heliosnova/nova","last_synced_at":"2026-04-02T14:01:27.304Z","repository":{"id":344691050,"uuid":"1182732172","full_name":"HeliosNova/nova","owner":"HeliosNova","description":"The personal AI that actually learns from its mistakes. 51 monitors, temporal KG, DPO fine-tuning. Runs on your GPU.","archived":false,"fork":false,"pushed_at":"2026-03-27T00:07:51.000Z","size":1165,"stargazers_count":10,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-03-27T02:40:23.929Z","etag":null,"topics":["ai","discord-bot","dpo","fastapi","fine-tuning","knowledge-graph","llm","local-ai","local-llm","mcp","ollama","personal-ai","personal-assistant","python","self-hosted","self-improving","sovereign-ai","telegram-bot"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HeliosNova.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-15T22:30:50.000Z","updated_at":"2026-03-27T00:07:54.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/HeliosNova/nova","commit_stats":null,"previous_names":["heliosnova/nova"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/HeliosNova/nova","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HeliosNova%2Fnova","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HeliosNova%2Fnova/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HeliosNova%2Fnova/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HeliosNova%2Fnova/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HeliosNova","download_url":"https://codeload.github.com/HeliosNova/nova/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HeliosNova%2Fnova/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31307459,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-02T12:59:32.332Z","status":"ssl_error","status_checked_at":"2026-04-02T12:54:48.875Z","response_time":89,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","discord-bot","dpo","fastapi","fine-tuning","knowledge-graph","llm","local-ai","local-llm","mcp","ollama","personal-ai","personal-assistant","python","self-hosted","self-improving","sovereign-ai","telegram-bot"],"created_at":"2026-04-02T14:00:53.015Z","updated_at":"2026-04-02T14:01:26.408Z","avatar_url":"https://github.com/HeliosNova.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Nova\n\n[![License: AGPL-3.0](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](LICENSE)\n[![Tests](https://img.shields.io/badge/tests-1%2C689%20passing-brightgreen)](tests/)\n[![Python 3.11+](https://img.shields.io/badge/python-3.11%2B-blue.svg)](https://python.org)\n[![Release](https://img.shields.io/github/v/release/HeliosNova/nova)](https://github.com/HeliosNova/nova/releases)\n\n**The personal AI that actually learns from its mistakes.**\n\nCorrect Nova once. It remembers forever. Correct it enough times, it fine-tunes itself into a smarter model. All on your hardware. Your data never leaves.\n\n```\nYou: \"What's the capital of Australia?\"\nNova: \"Sydney\"\nYou: \"That's wrong, it's Canberra\"\nNova: [saves lesson, generates DPO training pair, updates knowledge graph]\n\n--- 3 months later, different conversation ---\n\nYou: \"What's the capital of Australia?\"\nNova: \"Canberra\"  ← learned permanently\n```\n\nNo other open-source project combines all of these capabilities.\n\n### See it in action\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/demo.svg\" alt=\"Nova learning loop demo\" width=\"700\"\u003e\n\u003c/p\u003e\n\n---\n\n## Why Nova\n\nNova is a sovereign personal AI that runs entirely on your hardware with zero cloud dependencies. It doesn't just answer questions — it gets permanently smarter through a self-improvement pipeline that no other open-source project has:\n\n| | Nova | Khoj (32K stars) | Open WebUI (124K stars) |\n|---|---|---|---|\n| Learns from corrections | **Full pipeline** | No | No |\n| Fine-tunes itself | **DPO + A/B eval** | No | No |\n| Knowledge graph | **Temporal** | Experimental | No |\n| Hybrid retrieval | **Vector + BM25 + RRF** | Vector only | Vector only |\n| Zero cloud dependency | **Yes (bundled Ollama)** | Partial | Partial |\n| Prompt injection defense | **4-category detection** | No | No |\n| Messaging channels | **4 (all with allowlisting)** | 3 | 0 |\n| Proactive monitors | **52 across 35+ domains** | Automations | No |\n| MCP (client + server) | **Both** | No | Client only |\n\n## Quick Start\n\n**Prerequisites:** Docker + Docker Compose, NVIDIA GPU (20GB+ VRAM), NVIDIA Container Toolkit\n\n```bash\n# Clone and start\ngit clone https://github.com/HeliosNova/nova.git\ncd nova_\ncp .env.example .env\ndocker compose up -d\n\n# Pull models (one-time)\ndocker exec nova-ollama ollama pull qwen3.5:27b            # Main model\ndocker exec nova-ollama ollama pull nomic-embed-text-v2-moe # Embeddings\n```\n\nOpen `http://localhost:5173` — that's it.\n\n### Optional models for routing\n\n```bash\ndocker exec nova-ollama ollama pull qwen3.5:9b   # Vision model\ndocker exec nova-ollama ollama pull qwen3.5:4b   # Fast model (greetings, simple queries)\n```\n\n## How It Works\n\n```\nUser query -\u003e brain.think()\n  -\u003e load context (history + facts + lessons + skills + knowledge graph)\n  -\u003e classify intent (regex, no LLM call)\n  -\u003e retrieve documents (ChromaDB vectors + SQLite FTS5 + Reciprocal Rank Fusion)\n  -\u003e build system prompt (8 prioritized blocks with truncation budget)\n  -\u003e generate response (Ollama / OpenAI / Anthropic / Google)\n  -\u003e tool loop if needed (max 5 rounds, 20 built-in tools)\n  -\u003e stream tokens via SSE\n  -\u003e post-response: correction detection, fact extraction, reflexion, curiosity\n\nMeanwhile, 52 monitors run autonomously:\n  -\u003e web search across 35+ domains every 1-24h\n  -\u003e extract knowledge graph triples from every result\n  -\u003e send alerts via Discord/Telegram when something changes\n  -\u003e quiz itself on learned lessons, validate skills, research gaps\n```\n\nNo LangChain. No LangGraph. No agent frameworks. ~79 files of async Python + httpx + FastAPI.\n\n## The Learning Loop\n\nThis is what makes Nova unique. Every conversation makes it smarter:\n\n1. **Correction Detection** (2-stage) — regex pre-filter + LLM confirmation extracts what was wrong and what's correct\n2. **Lesson Storage** — topic, wrong answer, correct answer, lesson text — retrieved on future similar queries\n3. **DPO Training Pairs** — every correction generates `{query, chosen, rejected}` data for fine-tuning\n4. **Reflexion** *(experimental)* — heuristic failure detection (bad tool choices, short answers, exhausted loops) stored as warnings for future reference\n5. **Curiosity Engine** *(experimental)* — detects knowledge gaps (\"I don't know\", hedging, tool failures), queues background research via scheduled monitors\n6. **Success Patterns** — high-quality responses (score \u003e= 0.8) stored as positive reinforcement\n7. **Automated Fine-Tuning** — when enough pairs accumulate, runs DPO training with A/B evaluation before deploying\n\n```bash\npython scripts/finetune_auto.py --check   # Check readiness\npython scripts/finetune_auto.py           # Full pipeline: train -\u003e eval -\u003e deploy\n```\n\n## Tools (20 built-in)\n\n| Tool | What it does |\n|------|-------------|\n| `web_search` | Privacy-respecting search via SearXNG |\n| `calculator` | Math via SymPy — never does arithmetic in its head |\n| `http_fetch` | Fetch URLs with SSRF protection (blocks private IPs, DNS rebinding) |\n| `knowledge_search` | Hybrid retrieval: ChromaDB vectors + SQLite FTS5 + RRF fusion |\n| `code_exec` | Sandboxed Python (AST-analyzed, tier-restricted imports) |\n| `memory_search` | Search conversations and user facts |\n| `file_ops` | Read/write files (path-restricted per access tier) |\n| `shell_exec` | Shell commands (blocked patterns, tier-restricted, disabled by default) |\n| `browser` | Playwright-based web browsing with cookie clearing |\n| `screenshot` | Capture website screenshots |\n| `email_send` | SMTP email with recipient allowlist |\n| `calendar` | ICS calendar (create, list, search, delete) |\n| `webhook` | HTTP webhooks (URL-restricted) |\n| `reminder` | Schedule reminders via heartbeat system |\n| `monitor` | Create/manage proactive heartbeat monitors |\n| `delegate` | Delegate subtasks to parallel sub-agents |\n| `background_task` | Submit/track long-running background work |\n| `integration` | Connect to GitHub, Slack, Notion, etc. (10 templates) |\n| `desktop` | GUI automation via PyAutoGUI (optional, gated) |\n| `voice` | Local Whisper speech-to-text (optional, gated) |\n\nPlus dynamically created custom tools and MCP-discovered external tools.\n\n## Channels\n\nTalk to Nova where you already are:\n\n| Channel | Type | Config |\n|---------|------|--------|\n| **Discord** | Bot (websocket) | `DISCORD_TOKEN`, `DISCORD_CHANNEL_ID` |\n| **Telegram** | Bot (polling) | `TELEGRAM_TOKEN`, `TELEGRAM_CHAT_ID` |\n| **WhatsApp** | Webhook (Business API) | `WHATSAPP_API_URL`, `WHATSAPP_API_TOKEN` |\n| **Signal** | Polling (signal-cli) | `SIGNAL_API_URL`, `SIGNAL_PHONE_NUMBER` |\n\nAll channels support phone-number allowlisting, message splitting, and graceful reconnection.\n\n## Heartbeat Monitors\n\n52 autonomous monitors run on schedule across 35+ domains — Nova works even when you're not talking to it:\n\n| Category | Monitors | Schedule | What they do |\n|----------|----------|----------|-------------|\n| **Operational** | Morning Check-in, System Health, System Maintenance, Fine-Tune Check, Auto-Monitor Detector | 2h-weekly | Health checks, data hygiene, fine-tune readiness |\n| **Self-Improvement** | Lesson Quiz, Skill Validation, Curiosity Research | 1-12h | Self-tests on learned lessons, validates skills, researches knowledge gaps |\n| **Financial Intelligence** | Finance, Crypto \u0026 Web3, DeFi \u0026 Protocols, Whale Watch, Top Trades \u0026 Positioning, Commodities \u0026 Forex, Earnings \u0026 Corporate Events, Economics \u0026 Markets | 6-12h | Whale movements, trader positioning, commodity prices, earnings, macro data |\n| **International** | China Tech \u0026 Economy, Russia \u0026 Eastern Europe, Middle East, India, Europe \u0026 EU, Latin America, Africa \u0026 Emerging Markets | 8-24h | Regional perspectives from every major economic zone |\n| **Science \u0026 Tech** | Science, Technology, AI \u0026 ML, Space \u0026 Astronomy, Quantum Computing, Robotics \u0026 Autonomy, Physics \u0026 Mathematics, Biotech \u0026 Genetics, Semiconductors | 8-24h | Research breakthroughs, model releases, chip industry, gene therapy |\n| **Policy \u0026 Security** | US Policy \u0026 Regulation, Cybersecurity, Energy \u0026 Climate, Defense \u0026 Military Tech | 12h | Regulation, CVEs, climate policy, defense contracts |\n| **Developer** | Open Source \u0026 GitHub, Developer Ecosystem, Startups \u0026 VC | 12h | Trending repos, framework releases, funding rounds |\n| **Global \u0026 News** | World Awareness, Current Events, Geopolitics, Supply Chain \u0026 Trade, Research Frontiers, Hacker News Top Stories | 4-24h | Breaking news, trade disruptions, trending papers |\n| **Special Intelligence** | Product Hunt Trending, FDA Drug Approvals, FOMC \u0026 Fed Watch, SEC Insider Trading, GitHub Security Advisories, Government Contract Awards | 12-24h | Product launches, drug approvals, monetary policy, insider trades, CVEs, govt contracts |\n\nEvery query-type monitor auto-extracts knowledge graph triples. All results include today's date — no stale content.\n\n## Knowledge Graph\n\nTemporal knowledge graph that grows autonomously from 52 scheduled monitors:\n\n- 31 canonical predicates (`is_a`, `located_in`, `created_by`, `price_of`, `developed_by`, `works_at`, `member_of`, etc.)\n- `valid_from` / `valid_to` — when a fact was true\n- `superseded_by` — tracks how facts change over time (old facts aren't deleted, they're versioned)\n- `provenance` — which source/conversation created it\n- `query_at(entity, timestamp)` — what was true at a specific time\n- Auto-curation: heuristic + LLM pass removes garbage triples\n- Facts are used in chat: relevant KG triples are injected into the system prompt for contextual answers\n\n## MCP Integration\n\nNova is both an MCP **client** and **server** — unique in the landscape:\n\n**As client:** Drop MCP tool configs in `/data/mcp/` and Nova discovers and uses them.\n\n**As server:** Exposes 5 tools for Claude Code, Cursor, or any MCP client:\n- `nova_memory_query` — search user facts and conversations\n- `nova_knowledge_graph` — query the KG\n- `nova_lessons` — retrieve learned lessons\n- `nova_document_search` — search indexed documents\n- `nova_facts_about` — get user profile facts\n\n## Multi-Provider LLM\n\nSwitch providers with one env var:\n\n| Provider | Config | Default Model |\n|----------|--------|---------------|\n| **Ollama** (default) | `LLM_PROVIDER=ollama` | qwen3.5:27b |\n| **OpenAI** | `LLM_PROVIDER=openai` + key | gpt-4o |\n| **Anthropic** | `LLM_PROVIDER=anthropic` + key | claude-sonnet |\n| **Google** | `LLM_PROVIDER=google` + key | gemini-2.0-flash |\n\nModel routing *(experimental)*: configurable fast model for greetings, heavy model for complex reasoning, vision model for images. Set via `FAST_MODEL`, `HEAVY_MODEL`, `VISION_MODEL` env vars.\n\n## Security\n\nBuilt with [OWASP Agentic Security](https://genai.owasp.org/) in mind:\n\n**4-tier access control:**\n\n| Tier | Shell | Files | Code | Default |\n|------|-------|-------|------|---------|\n| `sandboxed` | Blocked | `/data` only | No os/subprocess | Yes |\n| `standard` | Limited | `/data`, `/tmp`, `/home` | pathlib only | |\n| `full` | Most allowed | Broad | Minimal restrictions | |\n| `none` | All | All | All | Dev only |\n\n**Defense in depth:**\n- Prompt injection detection (4 categories: role override, instruction injection, delimiter abuse, encoding tricks) with Unicode normalization and homoglyph detection\n- SSRF protection on HTTP fetch (blocks RFC 1918, loopback, link-local, IPv4-mapped IPv6, checks after redirects)\n- HMAC-SHA256 skill signing (`REQUIRE_SIGNED_SKILLS=true` by default)\n- Training data poisoning prevention (channel gating + confidence threshold)\n- Anti-sycophancy (refuses to override computed results)\n- Docker hardening (read-only root, no-new-privileges, all capabilities dropped)\n- Auth rate-limiting with IP lockout (10 failures → 5min lockout)\n- Security headers on all responses (CSP, X-Frame-Options, etc.)\n\n## Testing\n\n```bash\ndocker exec nova-app sh -c \"python -m pytest tests/ -v\"\n```\n\n1,689 tests across 64 files: brain pipeline, learning loop, tools, channels, monitors, security offensive, stress/concurrency, behavioral, and e2e.\n\n## Hardware Requirements\n\n| Component | Minimum | Recommended |\n|-----------|---------|-------------|\n| GPU VRAM | 20GB | 24GB+ |\n| RAM | 16GB | 32GB |\n| Disk | 50GB | 100GB |\n| GPU | RTX 3090 | RTX 4090 / A5000 |\n\n### Low VRAM / No GPU Options\n\nNova's LLM layer is provider-agnostic — you don't need a 3090.\n\n| Setup | VRAM | How |\n|-------|------|-----|\n| **Full local (default)** | 20GB+ | `qwen3.5:27b` via Ollama |\n| **Quantized local** | 16GB | `qwen3.5:27b-q4_K_M` — set `LLM_MODEL=qwen3.5:27b-q4_K_M` in `.env` |\n| **Smaller model** | 8GB | `qwen3.5:9b` — set `LLM_MODEL=qwen3.5:9b` in `.env` |\n| **Tiny model** | 4GB | `qwen3.5:4b` — set `LLM_MODEL=qwen3.5:4b` in `.env` |\n| **Cloud inference** | 0GB | Set `LLM_PROVIDER=openai` (or `anthropic`/`google`) + API key |\n| **Mixed** | 0GB | Cloud for inference, local for everything else |\n\nAll options keep your data fully local — memory, knowledge graph, lessons, training pairs, and conversations never leave your machine. Cloud mode only sends the current query + context to the LLM provider.\n\n```bash\n# Cloud mode — no GPU needed\ndocker compose -f docker-compose.cloud.yml up -d\n\n# Quantized — fits in 16GB VRAM\n# Just change LLM_MODEL in .env, then:\ndocker compose up -d\n```\n\n## Configuration\n\nAll settings via `.env`. See [CLAUDE.md](CLAUDE.md) for the full list of 150+ config fields.\n\n## Contributing\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md). Issues and PRs welcome.\n\n## License\n\n[AGPL-3.0](LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fheliosnova%2Fnova","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fheliosnova%2Fnova","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fheliosnova%2Fnova/lists"}