{"id":51370983,"url":"https://github.com/shenmintao/marginalia","last_synced_at":"2026-07-03T06:33:53.061Z","repository":{"id":359867022,"uuid":"1247594769","full_name":"shenmintao/marginalia","owner":"shenmintao","description":"A library-science-inspired personal knowledge management system with LLM agents","archived":false,"fork":false,"pushed_at":"2026-06-26T04:28:47.000Z","size":9158,"stargazers_count":168,"open_issues_count":0,"forks_count":24,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-26T06:14:11.693Z","etag":null,"topics":["apple-silicon","arm64","desktop-app","docker","document-management","document-search","fastapi","information-retrieval","knowledge-base","llm","llm-agent","local-first","multi-arch","personal-knowledge-management","python","rag","react","semantic-search","sqlite","tauri"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/shenmintao.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-23T14:23:38.000Z","updated_at":"2026-06-26T04:28:40.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/shenmintao/marginalia","commit_stats":null,"previous_names":["shenmintao/marginalia"],"tags_count":17,"template":false,"template_full_name":null,"purl":"pkg:github/shenmintao/marginalia","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shenmintao%2Fmarginalia","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shenmintao%2Fmarginalia/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shenmintao%2Fmarginalia/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shenmintao%2Fmarginalia/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/shenmintao","download_url":"https://codeload.github.com/shenmintao/marginalia/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shenmintao%2Fmarginalia/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":35075804,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-07-03T02:00:05.635Z","response_time":110,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apple-silicon","arm64","desktop-app","docker","document-management","document-search","fastapi","information-retrieval","knowledge-base","llm","llm-agent","local-first","multi-arch","personal-knowledge-management","python","rag","react","semantic-search","sqlite","tauri"],"created_at":"2026-07-03T06:33:52.535Z","updated_at":"2026-07-03T06:33:53.052Z","avatar_url":"https://github.com/shenmintao.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Marginalia\n\n\u003e Chinese README: [README.zh-CN.md](README.zh-CN.md)\n\u003e Detailed design: [DESIGN.md](DESIGN.md)\n\u003e GUI setup guide: [English](docs/GUI_TUTORIAL.md) · [中文](docs/GUI_TUTORIAL.zh-CN.md)\n\n**Turn your PDFs, notes, spreadsheets, logs, and archives into a private AI\nlibrary that answers from original sources.**\n\nMarginalia is a local-first research agent for people with messy private\nknowledge bases. It keeps your files in a normal folder tree, builds useful\nlibrary metadata around them, and makes the agent read the relevant original\nfile windows before it writes a cited answer.\n\n[Download desktop app](https://github.com/shenmintao/marginalia/releases) ·\n[GUI setup guide](docs/GUI_TUTORIAL.md) · [CLI quickstart](#cli-quickstart) · [Usage guide](USAGE.md) ·\n[Design notes](DESIGN.md)\n\n![Marginalia promotional hero](docs/images/marginalia-promo-en.png)\n\n![Marginalia desktop app screenshot](docs/images/desktop-screenshot-en.jpg)\n\n## Why Use It\n\n- You have research papers, meeting notes, PDFs, tables, logs, screenshots, and\n  archives that do not fit cleanly into one app.\n- You want answers that cite the source material instead of a black-box vector\n  search layer over chunks.\n- You need both quick lookups and slower investigation-style reports over the\n  same private library.\n- You want local-first storage: the default `mirror` backend keeps your library\n  as readable files under `MARGINALIA_HOME/library`.\n\n## What It Does\n\n- Ingests text, Markdown, PDFs, DOCX, images, spreadsheets, logs, and archives.\n- Organizes material with folders, catalogs, tags, views, metadata, journals,\n  and relation mining.\n- Recalls candidates with lexical search by default, plus optional embeddings,\n  `sqlite-vec`, reranking, and source quotas.\n- Reads original sections, pages, lines, archive members, or table slices before\n  answering.\n- Produces cited answers and reports, then writes durable investigation notes\n  that future turns can recall.\n\n## Try It\n\n### Desktop App\n\nDownload the latest desktop package from\n[GitHub Releases](https://github.com/shenmintao/marginalia/releases):\n\n- **Windows**: x64/arm64 installer and portable zip.\n- **macOS**: Intel and Apple Silicon DMGs.\n- **Linux**: x64/arm64 `.deb` and `.rpm`.\n\nThe desktop builds bundle their own Python runtime. They are currently unsigned,\nso Windows SmartScreen or macOS Gatekeeper may ask you to confirm the first\nlaunch.\n\nDesktop bundles also include CLI wrappers backed by the bundled Python\nruntime. They share the same `MARGINALIA_HOME` as the desktop app, so the CLI,\nMCP server, reusable backend, and worker work without installing a separate\nsystem Python package.\n\n- **Linux `.deb` / `.rpm`**: installs `marginalia`, `marginalia-mcp`, and\n  `marginalia-worker` under `/usr/bin`.\n- **Windows installer / portable zip**: includes `marginalia.cmd`,\n  `marginalia-mcp.cmd`, and `marginalia-worker.cmd` next to\n  `Marginalia.exe`. Use full paths in MCP clients or add the install folder to\n  `PATH`.\n- **macOS DMG**: includes wrappers inside the app bundle:\n  `/Applications/Marginalia.app/Contents/MacOS/marginalia`,\n  `marginalia-mcp`, and `marginalia-worker`.\n\n- **Windows**: click **More info** -\u003e **Run anyway** if SmartScreen blocks the\n  first launch.\n- **macOS**: after dragging the app to `/Applications`, run\n  `xattr -dr com.apple.quarantine /Applications/Marginalia.app` if Gatekeeper\n  reports that the app is damaged or cannot be verified.\n\n### CLI Quickstart\n\nRequires Python 3.11+.\n\n```bash\npython -m venv .venv\n\n# Windows PowerShell\n.\\.venv\\Scripts\\Activate.ps1\n\n# macOS / Linux\nsource .venv/bin/activate\n\npip install -e \".[dev]\"\nmarginalia init\n```\n\nEdit `.env`:\n\n```ini\nMARGINALIA_API_HOST=127.0.0.1\nMARGINALIA_API_PORT=8000\nLLM_DEFAULT_PROVIDER=openai\nLLM_DEFAULT_API_KEY=sk-...\nLLM_DEFAULT_MODEL=gpt-4o-mini\n```\n\nRun the embedded CLI + API + worker:\n\n```bash\nmarginalia\n```\n\nThen:\n\n```text\nmarginalia\u003e /upload ./paper.pdf /papers/\nmarginalia\u003e /background\nmarginalia\u003e compare this paper with my Paxos notes\nmarginalia\u003e /export\n```\n\nThe first launch bootstraps the database schema automatically.\n\nTo share one backend across the desktop app, CLI sessions, MCP, skill-driven\nautomation, or external HTTP clients, start the reusable HTTP backend instead:\n\n```bash\nmarginalia serve\n```\n\n`marginalia serve` reads `MARGINALIA_API_HOST` and `MARGINALIA_API_PORT` from\n`.env` and writes its live URL to `MARGINALIA_HOME/runtime/server.json`.\nDesktop and CLI clients auto-discover that file; skills inherit this when they\ndrive the `marginalia` CLI. Explicit `--server URL` or `MARGINALIA_SERVER`\nstill take precedence.\n\n## Example Questions\n\n```text\nCompare this Raft paper with my Paxos notes.\nFind the incident timeline across the logs and the postmortem.\nWhich uploaded papers support this claim, and which contradict it?\nSummarize the spreadsheet, then cite the rows used for the conclusion.\nTurn this folder into a cited research brief.\n```\n\n## How It Differs From Plain RAG\n\nMarginalia is not just \"retrieve top-k chunks and answer.\" The agent can recall\nprior investigations, inspect structured metadata, follow related entries, read\noriginal source windows, and correct its search path before writing. Quick mode\nkeeps this bounded for short lookups; Deep mode keeps the full ReAct\ninvestigation loop when coverage matters more than latency.\n\n## The Retrieval Funnel\n\n```text\nuser question\n  -\u003e plan\n  -\u003e recall_knowledge            # journal + metadata + optional semantic recall\n  -\u003e search_metadata/list_folder # focused follow-up over names, summaries, tags\n  -\u003e read_entries_metadata       # sections, extra, related entries\n  -\u003e discover/related entries    # graph-based neighbours\n  -\u003e read_files                  # original text/page/line/member/table slice\n  -\u003e answer with footnotes\n  -\u003e reflect_turn                # durable journal memory\n```\n\nThe agent is instructed to use `recall_knowledge` for broad material location.\nThat tool resolves tag hints, searches prior journal notes and entry metadata,\noptionally adds semantic candidates, ranks the merged pool, and returns compact\ncandidate IDs for batched metadata verification and source reads. Lower-level\ntools such as `search_journal`, `search_metadata`, and `materialize_view`\nremain available for focused follow-up and debugging.\n\nMetadata text search is indexed in both supported database modes. SQLite uses\nthe local FTS5 trigram table; Postgres uses native `to_tsvector` /\n`websearch_to_tsquery` expression GIN indexes over file and entry metadata.\nChinese short terms that are too small for trigram tokenization are preserved\nwith a bounded LIKE fallback in mixed metadata queries.\nJournal recall also validates referenced entries at read time. If a prior\nnote points at a deleted entry or a file reprocessed after the note was\nwritten, the note is kept for audit but marked stale and ranked behind current\nnotes. Later reflections can also mark directly contradicted journal rows\n`invalidated_*`; active recall hides them by default while audit queries can\ninclude them.\n\n## Supported Ingest Pipelines\n\n- `text`: text, Markdown, reStructuredText, code-like text.\n- `pdf`: text-layer PDF, long-PDF page windows, PDF page labels, scanned-PDF OCR fallback when a vision profile is configured.\n- `image`: image indexing and description when a vision profile is configured.\n- `docx`: Word documents.\n- `spreadsheet`: CSV, TSV, JSON, XLSX, Parquet and related table formats.\n- `log`: logs and logrotate variants.\n- `archive`: zip, tar, 7z, rar, gz, bz2, xz, iso, cab and other py7zz-supported containers.\n\n## Retrieval Evaluation\n\nExternal retrieval datasets can be imported from a local BEIR-style directory:\n\n```text\n\u003cdataset\u003e/\n  corpus.jsonl\n  queries.jsonl\n  qrels/test.tsv\n```\n\nImport is synchronous. Each corpus document is written as a normal entry and\nimmediately passed through the ingest pipeline, so the command returns only\nafter the eval corpus is indexed.\n\n```bash\nMARGINALIA_HOME=./runtime/eval/scifact marginalia eval import-beir scifact ./datasets/scifact\nMARGINALIA_HOME=./runtime/eval/scifact EMBEDDING_API_KEY=... marginalia eval build-semantic-index scifact\nMARGINALIA_HOME=./runtime/eval/scifact marginalia eval run scifact --retriever search_metadata --k 10,50,100 --json report.json\nMARGINALIA_HOME=./runtime/eval/scifact marginalia eval run scifact --retriever semantic_recall --k 10,50,100\nMARGINALIA_HOME=./runtime/eval/scifact marginalia eval ablation-run scifact --k 10,50,100 --json ablation-report.json\nMARGINALIA_HOME=./runtime/eval/scifact marginalia eval answer scifact --retriever recall_knowledge --query-id \u003cqid\u003e --timeout-seconds 300\nMARGINALIA_HOME=./runtime/eval/scifact marginalia eval answer-run scifact --retriever recall_knowledge --qrels-only --query-limit 20 --concurrency 10 --json answer-report.json\nMARGINALIA_HOME=./runtime/eval/scifact marginalia eval compare-report scifact --query-limit 30 --concurrency 3 --json compare-report.json\n```\n\nUse a dedicated `MARGINALIA_HOME` for external benchmarks unless you\nintentionally want benchmark documents inside your personal library.\n`eval build-semantic-index` uses the configured embedding provider. The\ndefault is Alibaba Cloud Model Studio / DashScope `text-embedding-v4`; set\n`EMBEDDING_API_KEY` before building. Embedding credentials are intentionally\nseparate from `LLM_*` profiles. Semantic recall is optional and disabled by\ndefault; set `SEMANTIC_RECALL_ENABLED=true` to merge semantic candidates from\nthe default semantic index with the lexical metadata recall path. The eval CLI\nindex builder targets imported datasets; the GUI/API can enqueue a whole-library\nsemantic-index rebuild for the default index after embedding model or dimension\nchanges. Ingest also refreshes the affected file's semantic vectors after a\nsuccessful run when semantic recall is configured. If the optional `sqlite-vec`\ndependency is installed, the semantic index also writes `vectors.sqlite` and\nsearch uses it before falling back to the file index. Install with\n`pip install -e \".[semantic]\"`, or set `SEMANTIC_INDEX_BACKEND=file` to keep\nonly the file backend.\nOptional reranking can refine the merged candidate pool before evidence\nselection. Enable it with `RERANK_ENABLED=true`, `RERANK_API_KEY=...`, and\noptionally `RERANK_MODEL=qwen3-rerank`. Rerank credentials are also separate\nfrom `LLM_*`; no chat or vision key is reused implicitly. Evidence selection\ndefaults to `EVIDENCE_SELECTION=quota`; set `EVIDENCE_SELECTION=rerank` to take\nthe reranked top evidence directly.\nThe eval report treats `hit@k` and `candidate_recall@k` as the investigation\ncandidate-pool metrics; MRR and nDCG are ranking-efficiency diagnostics.\n`eval ablation-run` runs the candidate-pool matrix for metadata-only,\nmetadata-plus-relations, hybrid semantic recall, hybrid-plus-relations,\nhybrid-plus-rerank, and full recall. It reports deltas against metadata-only\nso relation expansion, semantic recall, and rerank contributions can be\ntracked before changing the agent loop.\n`eval answer` is a bounded final-answer probe: it retrieves candidates, reads\nlimited source text, performs one answer-generation call, and reports whether\nthe answer cited a qrels-relevant document. `eval answer-run` repeats the same\nbounded probe across imported queries and reports aggregate final-answer\ncitation hit rate; use `--qrels-only` to apply `--query-limit` after filtering\nto imported qrels-backed queries and `--concurrency` to run independent answer\nprobes in parallel. When BEIR query metadata includes SciFact-style\nSUPPORT/CONTRADICT labels, the answer report also includes label accuracy.\n`eval compare-report` runs a blind end-to-end comparison between a one-shot\nRAG report and the full ReAct investigation workflow on the same query set.\nWhen SciFact-style gold labels are available, the judge prioritizes verdict\ncorrectness before report completeness.\n\nLatest local validation on SciFact 300:\n\n- Retrieval with `recall_knowledge` + rerank top-80 reached MRR 0.7226,\n  hit@10 0.8800, and hit@100 0.9133.\n- Bounded final-answer probes with rerank top-80 and quota evidence selection\n  reached evidence hit 0.8667, citation hit 0.7133, and label accuracy 0.8085.\n- A 30-query end-to-end report comparison favored the full ReAct workflow over\n  one-shot RAG in 26/30 cases, with 2 one-shot RAG wins, 2 ties, and 1 timeout.\n\nThese results support Marginalia's current positioning: for quick lookups it\nbehaves like a hybrid RAG system, while the full ReAct workflow is a slower\ndeep-investigation path that can produce better source-grounded reports.\nThey should not be read as a claim of general benchmark SOTA: the dataset is\nsmall, the comparison target is a local one-shot RAG baseline, and final\nquality still depends on model behavior, ingest quality, and available\nevidence.\n\n## CLI Surface\n\n`marginalia` with no arguments opens the interactive REPL. The same command\nsurface is also available as one-shot subcommands for scripts, CI, and agents\nthat do not use MCP:\n\n```bash\nmarginalia ask \"Compare this Raft paper with my Paxos notes\"\nmarginalia search \"raft consensus\" --json\nmarginalia info \u003centry_id\u003e --json\nmarginalia discover \u003centry_id\u003e --top-k 12 --json\nmarginalia check --json\nmarginalia ingest --all --yes --json\nmarginalia reprocess failed --json\n```\n\nOne-shot commands use the same backend discovery model as the REPL: explicit\n`--server URL`, then `MARGINALIA_SERVER`, then\n`MARGINALIA_HOME/runtime/server.json`, and finally an embedded backend. Text\noutput is meant for humans; `--json` keeps stdout structured for automation.\n\nSlash commands:\n\n```text\n/help                         list commands\n/upload \u003clocal\u003e \u003cremote\u003e      upload a file or directory into the vault\n/check                        diff mirror vault vs database\n/ingest \u003cpath\u003e | --all        sync manual vault edits into the database\n/reprocess failed             re-run ingest for failed files\n/reprocess folder \u003cid\u003e failed re-run failed files in one folder subtree\n/search \u003cquery\u003e               metadata recall\n/info \u003centry_id\u003e              entry metadata and preview\n/discover \u003centry_id\u003e [N]      related entries from the evidence graph\n/discover \u003centry_id\u003e --all    include unvetted relation signals\n/discover \u003centry_id\u003e --vet    queue background vetting for direct signals\n/tree                         folder tree\n/download \u003cid\u003e [dest]         download file or folder zip\n/export [conversation_id]     export answer and citations\n/tend                         run a maintenance pass\n/background                   show queued/running tasks\n/mode [auto|quick|deep]       show or change chat mode\n/new / /clear / /quit         session control\n```\n\nAny non-slash input is sent to the investigator agent. Chat defaults to\n`auto`: the planner selects a quick/standard/deep execution budget from a\nplain `BUDGET:` control line and the runtime can upgrade it while tools are\nstill producing new evidence. `/mode quick` and `/mode deep` remain manual\noverrides.\n\n## MCP Server\n\nMarginalia can also run as a stdio MCP server for external agents:\n\n```bash\nmarginalia mcp\n# or\nmarginalia-mcp\n```\n\nThe MCP server uses the same backend discovery model as the CLI: explicit\n`--server URL`, then `MARGINALIA_SERVER`, then\n`MARGINALIA_HOME/runtime/server.json`, and finally an embedded backend if\nnothing is already running. A Claude Desktop-style command entry can point at\nthe same executable and set `MARGINALIA_HOME` / database settings through the\nenvironment.\n\nMCP exposes structured workflow tools including `ask_marginalia`,\n`upload_file`, `download_file`, `download_folder`, `export_conversation`,\n`search_files`, `get_file_metadata`, plus retrieval/source-reading tools such\nas `recall_knowledge`, `search_metadata`, `search_journal`,\n`read_entries_metadata`, and `read_files`.\n\n## API Surface\n\nBusiness endpoints live under `/v1`:\n\n```text\nPOST /v1/upload\nGET  /v1/search\nGET  /v1/file-entries/{entry_id}/metadata\nGET  /v1/file-entries/{entry_id}/content\nPOST /v1/sessions\nPOST /v1/chat/{session_id}          # Server-Sent Events\nGET  /v1/conversations/{id}/export\nPOST /v1/tend\nGET  /v1/tasks/active\nGET  /v1/settings/llm\nGET  /health\n```\n\nThe desktop GUI and CLI both use the same API.\n\n`POST /v1/chat/{session_id}` accepts `{ \"query\": \"...\", \"mode\": \"deep\" }`\nor `{ \"query\": \"...\", \"mode\": \"quick\" }`. Omit `mode` for the default `auto`\nplanner-selected budget behavior.\n\n## Configuration\n\nCore `.env` fields:\n\n```ini\nMARGINALIA_HOME=~/Marginalia\nDB_BACKEND=sqlite                  # sqlite or postgres\nSTORAGE_BACKEND=mirror             # mirror, local, or s3\nWORKER_ENABLED=true\nAUTO_LIFECYCLE_ENABLED=false\nMAINTENANCE_DAILY_TOKEN_BUDGET=0  # rolling 24h background cap; 0 = unlimited\nRELATION_BACKGROUND_VETTING_ENABLED=false\n\nLLM_DEFAULT_PROVIDER=openai        # openai, openai-compatible, anthropic\nLLM_DEFAULT_API_KEY=sk-...\nLLM_DEFAULT_BASE_URL=\nLLM_DEFAULT_MODEL=gpt-4o-mini\n\nLLM_CHAT_MODEL=\nLLM_REFLECT_MODEL=\nLLM_INGEST_MODEL=\nLLM_VISION_MODEL=\n\nEMBEDDING_API_KEY=\nEMBEDDING_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1\nEMBEDDING_MODEL=text-embedding-v4\nSEMANTIC_RECALL_ENABLED=false\nSEMANTIC_INDEX_BACKEND=auto        # auto, file, sqlite-vec\n\nRERANK_ENABLED=false\nRERANK_API_KEY=\nRERANK_BASE_URL=https://dashscope.aliyuncs.com/compatible-api/v1\nRERANK_MODEL=qwen3-rerank\nEVIDENCE_SELECTION=quota           # quota or rerank\n\nAGENT_PLAN_MAX_TOKENS=1024\nAGENT_EXECUTE_MAX_TOKENS=2048\nAGENT_FINAL_ANSWER_CONTINUE_TURNS=3\nAGENT_FINAL_ANSWER_MAX_CHARS=120000\n\n# Built-in compression.\nCOMPRESSION_ENABLED=true\nCOMPRESSION_MIN_CHARS=12000\nCOMPRESSION_TARGET_CHARS=8000\nCOMPRESSION_CONTEXT_CHARS=220\nCOMPRESSION_MAX_RATIO=0.85\n```\n\nUse `openai-compatible` for DeepSeek, Together, Groq, local vLLM, Ollama, and other OpenAI wire-compatible services.\n\nThe `vision` profile is optional. Without it, image enrichment, PDF figure captioning, and scanned-PDF OCR degrade gracefully or are skipped.\n\nCompression uses one master switch, `COMPRESSION_ENABLED`. Marginalia vendors the dependency-free Headroom SearchCompressor, LogCompressor, SmartCrusher, and TextCrusher cores for large `read_files` model views, model-facing results from `search_metadata`, `query_sql`, and `query_log`, structured/log ingest views, archive member peeks, and long aggregate index prompts. It fails open to original content if a compressed view does not beat `COMPRESSION_MAX_RATIO`. Persisted tool-call results, UI previews, and original files stay unmodified; compressed `read_files` metadata includes `compress=false` reopen args for exact quoting.\n\n`MAINTENANCE_DAILY_TOKEN_BUDGET` is a rolling 24-hour cap for background\nmaintenance LLM usage. When it is exhausted, low-priority speculative tasks\n(`restructure_catalogs`, `vet_relations`, `propose_views`) defer to a later\ntick; foreground ingest and chat reflection are not limited.\n\nRelation discovery is pure-read by default. Miners write cheap raw signals,\nand `/discover` reads the already-vetted graph without calling an LLM. Use\n`/discover \u003centry_id\u003e --vet` (API: `vet=true`) to queue background vetting for\nthat seed's direct raw edges, or set `RELATION_BACKGROUND_VETTING_ENABLED=true`\nif you want the periodic worker to batch-vet relation edges ahead of time.\n\nWhen a long final answer hits the model token limit, Marginalia can continue it server-side and emit one merged answer event to the GUI. Tune `AGENT_FINAL_ANSWER_CONTINUE_TURNS` and `AGENT_FINAL_ANSWER_MAX_CHARS` for research-heavy deployments.\n\n## Storage and Deployment\n\nDefault local layout:\n\n```text\n\u003cMARGINALIA_HOME\u003e/marginalia.db\n\u003cMARGINALIA_HOME\u003e/library/\n\u003cMARGINALIA_HOME\u003e/objects/\n```\n\n`STORAGE_BACKEND=mirror` stores files as a readable folder tree. `local` stores UUID-addressed objects. `s3` is for multi-host deployments.\n\nSingle-process mode:\n\n```bash\nmarginalia\n```\n\nRemote API mode:\n\n```bash\nmarginalia serve --host 0.0.0.0 --port 8000\nmarginalia --server http://server:8000\n# If the server sets MARGINALIA_API_TOKEN:\nmarginalia --server http://server:8000 --api-token \"$MARGINALIA_API_TOKEN\"\n```\n\nDocker compose starts API, worker, Postgres, and MinIO:\n\n```bash\necho \"LLM_DEFAULT_API_KEY=sk-...\" \u003e .env\ndocker compose up -d\n```\n\nThe compose file binds the API and MinIO console to `127.0.0.1` by default.\nIf you deliberately expose the API on a LAN, set `MARGINALIA_API_TOKEN` and\nsend `Authorization: Bearer \u003ctoken\u003e` from the CLI or desktop connection\nsettings.\n\n### Multi-device sync\n\nDo not use Dropbox, Syncthing, iCloud Drive, OneDrive, or similar file-sync\ntools to sync a live `MARGINALIA_HOME`. SQLite and the mirror/local storage\nlayout can be corrupted by concurrent replication. For multiple machines, use\nthe remote deployment shape with Postgres and S3-compatible object storage.\n\n## Documentation\n\n- [USAGE.md](USAGE.md): operations manual.\n- [DESIGN.md](DESIGN.md): data model, retrieval design, task system, invariants.\n- [samples/architecture.md](samples/architecture.md): developer architecture overview.\n- [docs/LAUNCH.md](docs/LAUNCH.md): launch copy, social preview notes, and community post templates.\n\n## Development\n\n```bash\nuv run ruff check src tests\n.\\.venv\\Scripts\\python -B -m pytest tests -q\n```\n\nCurrent tests cover upload, ingest, agent runtime, tool execution, export, task scheduling, PDF/DOCX/image/table/archive pipelines, relation discovery, lifecycle behavior, semantic index fallback, recall/rerank scoring, evaluation commands, and CLI flows.\n\n## Community links\nThis open-source project is linked with and recognized by the LINUX DO community:\n\nLINUX DO: [https://linux.do/](https://linux.do/)\n\nThanks to [Headroom](https://github.com/chopratejas/headroom) for the compression algorithms and architecture vendored into Marginalia's built-in compression path.\n\n## License\n\nAGPL-3.0-or-later. See [LICENSE](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshenmintao%2Fmarginalia","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshenmintao%2Fmarginalia","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshenmintao%2Fmarginalia/lists"}