{"id":50426867,"url":"https://github.com/zilliztech/mfs","last_synced_at":"2026-05-31T11:30:20.918Z","repository":{"id":354819674,"uuid":"1215828121","full_name":"zilliztech/mfs","owner":"zilliztech","description":"Agent-native file search CLI for large local workspaces, ideal for managing memory, skill, codebase and knowledgebase.","archived":false,"fork":false,"pushed_at":"2026-04-30T09:58:04.000Z","size":1318,"stargazers_count":0,"open_issues_count":2,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-30T10:13:51.961Z","etag":null,"topics":["agent","agent-memory","claude-code","codex","embeddings","file-search","markdown","memory","milvus","progressive-disclosure","rag","semantic-search","skills","vector-database"],"latest_commit_sha":null,"homepage":"https://zilliztech.github.io/mfs/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zilliztech.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-20T09:43:04.000Z","updated_at":"2026-04-30T10:02:58.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/zilliztech/mfs","commit_stats":null,"previous_names":["zilliztech/mfs"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/zilliztech/mfs","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zilliztech%2Fmfs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zilliztech%2Fmfs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zilliztech%2Fmfs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zilliztech%2Fmfs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zilliztech","download_url":"https://codeload.github.com/zilliztech/mfs/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zilliztech%2Fmfs/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33730240,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-31T02:00:06.040Z","response_time":95,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent","agent-memory","claude-code","codex","embeddings","file-search","markdown","memory","milvus","progressive-disclosure","rag","semantic-search","skills","vector-database"],"created_at":"2026-05-31T11:30:20.149Z","updated_at":"2026-05-31T11:30:20.895Z","avatar_url":"https://github.com/zilliztech.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003eMFS\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cstrong\u003eSemantic file search CLI — built for AI agents driving a shell.\u003c/strong\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/zilliztech/mfs/blob/main/LICENSE\"\u003e\u003cimg src=\"https://img.shields.io/github/license/zilliztech/mfs?style=flat-square\" alt=\"License\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://pypi.org/project/mfs-cli/\"\u003e\u003cimg src=\"https://img.shields.io/pypi/v/mfs-cli?style=flat-square\u0026color=blue\" alt=\"PyPI\"\u003e\u003c/a\u003e\n  \u003cimg src=\"https://img.shields.io/badge/python-%3E%3D3.10-blue?style=flat-square\u0026logo=python\u0026logoColor=white\" alt=\"Python\"\u003e\n  \u003ca href=\"https://milvus.io/\"\u003e\u003cimg src=\"https://img.shields.io/badge/powered%20by-Milvus-00A1EA?style=flat-square\" alt=\"Milvus\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/zilliztech/mfs/stargazers\"\u003e\u003cimg src=\"https://img.shields.io/github/stars/zilliztech/mfs?style=flat-square\" alt=\"Stars\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n**MFS** stands for both **M**emory **F**ile **S**earch and **M**ilvus **F**ile **S**earch.\nIt's a POSIX-style CLI (`ls` / `tree` / `cat` / `grep` / `search`) that gives an AI\nagent semantic access to any folder — and gives a human the same toolkit at the\nterminal. Files are the source of truth; Milvus is the derived index underneath.\n\nThe \"memory\" part is not just branding. Modern agent memory is often discussed\nas semantic, episodic, procedural, and working memory; in practice, the durable\nlayers are usually files: Markdown notes, JSONL transcripts, SKILL documents,\nrunbooks, PDFs, DOCX files, and code. MFS makes those memory files searchable\nwith Milvus. See the [docs homepage](https://zilliztech.github.io/mfs/) for the\nfull story.\n\n---\n\n## Where MFS sits\n\nMFS is a **middle layer**: below it, Milvus / Zilliz Cloud is abstracted away;\nabove it, any agent application that has folders full of files — memory logs,\nskill definitions, session transcripts, source code — can plug in without\ntouching a vector database.\n\n```\n┌────────────────────────────────────────────────────────────────────┐\n│   Agent Applications                                               │\n│   ─────────────────────────────────────────────────────────        │\n│   memory systems      skill managers       codebase copilots       │\n│   (daily .md logs)    (trees of SKILL.md)  (repo-aware chat)       │\n│   session replayers   knowledge bases      …your next agent app    │\n│   (session .jsonl)    (docs, PDFs)                                 │\n└────────────────────────────────┬───────────────────────────────────┘\n                                 │  invokes CLI / Skill\n                                 ▼\n┌────────────────────────────────────────────────────────────────────┐\n│   MFS   ← you are here                                             │\n│   ─────────────────────                                            │\n│   📟  CLI    mfs add · search · grep · ls · tree · cat             │\n│   🧠  Skill  skills/mfs    (reusable agent workflow instructions)   │\n│                                                                    │\n│   Hybrid retrieval · density presets · JSON Hit envelope ·         │\n│   tree-sitter AST · pipe-aware · ~/.mfs/ state only                │\n└────────────────────────────────┬───────────────────────────────────┘\n                                 │  wraps / abstracts\n                                 ▼\n┌────────────────────────────────────────────────────────────────────┐\n│   Milvus   (Lite · Self-hosted · Zilliz Cloud)                     │\n│   ────────────────────────────────────────────                     │\n│   dense vectors · BM25 sparse · RRF fusion · metadata filters      │\n└────────────────────────────────────────────────────────────────────┘\n```\n\n### Why a CLI — not an SDK or an HTTP API?\n\nBecause **agents already speak shell.** An LLM agent can plan\n`mfs tree --peek` → `mfs cat --skim` → `mfs search \"...\"` with zero integration\ncode — the same way a human developer would. No client library to version, no\nservice to keep alive, no schema to import; just a binary on `$PATH` and a\n`--json` flag when the caller is a machine.\n\nMFS ships the two things an agent-first tool actually needs:\n\n- **📟 a CLI** — for any agent that can run shell commands (Claude Code, Codex\n  CLI, OpenCode, your own)\n- **🧠 a companion Skill** — `skills/mfs`, a reusable agent skill that teaches\n  when to search, when to browse, when to verify with line ranges, and when to\n  use native shell tools instead\n\nClosing the loop: **MFS is the tool agents use to build agent apps.** The same\nCLI that powers a memory system or a skill manager is also what you hand to\n*your own* agent while you're building it.\n\n---\n\n## Why MFS\n\n- **🤖 Shell-native, agent-first.** The commands an agent already knows (`ls`, `cat`,\n  `grep`) — now semantically aware. Every command has a `--json` mode with a unified\n  `Hit` envelope so an agent can parse without regex.\n- **🔎 Hybrid retrieval.** Dense vectors for meaning, BM25 for exact tokens, RRF for\n  fusion. Short and long queries both work.\n- **📏 Progressive browsing.** `--peek` / `--skim` / `--deep` on `ls` / `tree` / `cat`\n  share one density model — orient an agent in a new repo without burning its\n  context window.\n- **🧩 Multi-format indexing.** Markdown, source code (tree-sitter AST across 15\n  languages), PDF (via pymupdf4llm), and DOCX (via python-docx). JSON, JSONL,\n  CSV, HTML and friends stay grep-able and readable via `mfs cat`.\n- **🔀 Smart grep routing.** Indexed files go through Milvus BM25; everything else\n  falls back to the system `grep` — you don't think about which is which.\n- **🚫 No LLM in the hot path.** Chunking, summarization heuristics, embedding — all\n  run without calling any LLM. LLM / VLM enrichment is strictly opt-in.\n- **🧼 Zero intrusion.** All state lives in `~/.mfs/`. Your project directory gets\n  nothing added to it.\n\nSee [docs/skill.md](docs/skill.md) to install the companion MFS agent skill for\nCodex, Claude Code, or another shell-based agent.\n\n---\n\n## Evaluation highlights\n\nWe tested MFS in end-to-end agent runs on large code and documentation\ncorpora. The strongest result came from using both MFS search and MFS browse:\nsearch finds better candidates, and browse helps the agent compare them without\nreading whole files.\n\n![Code search baseline comparison](https://github.com/user-attachments/assets/da624f61-fccc-40b9-bc07-77d6bc416e57)\n\n![Hard code search baseline comparison](https://github.com/user-attachments/assets/95ed7047-5c46-4f1a-aea7-97354d86252b)\n\n![Document search baseline comparison](https://github.com/user-attachments/assets/e224455f-1a46-41c0-9143-d93946283322)\n\nSee [evaluation](evaluation/README.md) for the full setup, metrics, examples,\nand machine-readable artifacts.\n\n---\n\n## Install\n\nInstall from PyPI. The package name is `mfs-cli`; the command is `mfs`:\n\n```bash\nuv tool install mfs-cli\nmfs --help\n```\n\nFor one-off use without installing a persistent tool:\n\n```bash\nuvx --from mfs-cli mfs --help\n```\n\nTo use optional providers, install the matching extra:\n\n```bash\nuv tool install \"mfs-cli[onnx]\"              # local bge-m3 ONNX INT8\nuv tool install \"mfs-cli[google]\"            # Google Gemini embeddings\nuv tool install \"mfs-cli[llm-anthropic]\"     # Anthropic summaries/descriptions\n```\n\nFor development from source:\n\n```bash\ngit clone https://github.com/zilliztech/mfs.git\ncd mfs\nuv sync                      # base install (OpenAI embedding ready)\nuv run mfs --help\n```\n\nMFS is managed with [`uv`](https://docs.astral.sh/uv/) and `pyproject.toml`.\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eSource install extras — other embedding / LLM providers\u003c/b\u003e\u003c/summary\u003e\n\n```bash\n# Embedding providers\nuv sync --extra onnx              # local bge-m3 ONNX INT8, no API key\nuv sync --extra google            # Google Gemini embeddings\nuv sync --extra voyage            # Voyage AI\nuv sync --extra jina              # Jina\nuv sync --extra ollama            # local Ollama models\nuv sync --extra mistral           # Mistral\nuv sync --extra local             # sentence-transformers (GPU)\n\n# LLM / VLM providers (for --summarize / --describe)\nuv sync --extra llm-anthropic     # Claude\nuv sync --extra llm-google        # Gemini\nuv sync --extra llm-ollama        # local Ollama\nuv sync --extra llm-mistral       # Mistral\n\n# Everything at once\nuv sync --extra all\n```\n\nEnvironment variables (only the ones you actually use):\n\n```bash\nexport OPENAI_API_KEY=\"sk-...\"       # default provider\nexport GOOGLE_API_KEY=\"...\"          # or GEMINI_API_KEY\nexport ANTHROPIC_API_KEY=\"...\"\nexport VOYAGE_API_KEY=\"...\"\nexport JINA_API_KEY=\"...\"\nexport MISTRAL_API_KEY=\"...\"\n```\n\n\u003c/details\u003e\n\n---\n\n## Quick start\n\n```bash\n# 1. Index the current directory (incremental — re-runs are cheap)\n$ mfs add .\nProcessing 184 files under /repo\nIndexed: 184 files scanned, 184 touched, 0 deleted, 2341 chunks queued.\nWorker running in background. Run `mfs status` to check progress.\n\n# 2. Semantic search — pass a positional \u003cpath\u003e scope (POSIX-style, like grep)\n#    or --all to search every indexed folder.\n#    Header is [N] \u003cpath\u003e  score=…, with line numbers in the left gutter of\n#    each body line (ripgrep-style).\n$ mfs search \"how do we handle token expiration\" .\n[1] src/auth/token.py  score=0.890\n142  def refresh_token(user_id: str, refresh_jwt: str) -\u003e Token:\n143      \"\"\"Exchange a refresh token for a new access token.\n144\n145      Raises TokenExpiredError if the refresh token is past its TTL —\n146      the caller should redirect to login.\n147      \"\"\"\n148      ...\n\n[2] docs/auth.md  score=0.710\n 24  ## Token expiration\n 25\n 26  Access tokens live 15 minutes; refresh tokens live 14 days.\n\n# 3. Exact-match — Milvus BM25 for indexed files, system grep for the rest\n$ mfs grep \"ERR_TOKEN_EXPIRED\" .\nsrc/auth/token.py\n167      raise TokenExpiredError(\"ERR_TOKEN_EXPIRED\")\n\n# 4. Orient yourself in an unfamiliar folder — cheap peek\n$ mfs tree --peek -L 2 ./docs/\n\n# 5. Check indexing status\n$ mfs status\n```\n\n---\n\n## 🤖 For agents driving a shell\n\nMFS gives an agent two complementary command families:\n\n- **🔎 Search** — flat retrieval over the whole corpus (dense + BM25 + RRF)\n- **📖 Browse** — walk along the natural hierarchy of files and folders\n  (headings, symbols, directory trees), paying only a few tokens to see what's\n  there\n\nSearch finds candidates in a sea of text. Browse lets the agent look around\n*between* those candidates without reading whole files. Two legs — neither\nworks as well on its own.\n\n### 🔎 Search — find candidates in a sea of text\n\nFlat, corpus-wide retrieval. `mfs search` is hybrid (dense + BM25 + RRF);\n`mfs grep` is exact-match (Milvus BM25 for indexed files, falls back to the\nsystem `grep` for everything else).\n\nBoth commands take a **positional `\u003cpath\u003e` scope** — POSIX-style, like\n`grep pattern path`. Pass `--all` to search every indexed folder; without a\n`\u003cpath\u003e` or `--all`, the command errors rather than silently defaulting to the\nwhole index.\n\n```bash\nmfs search \"how do we handle token expiration\" .   # hybrid, scope = cwd\nmfs search \"oauth flow\" ./src/ --mode semantic     # dense only\nmfs search \"ERR_TOKEN\" ./src/ --mode keyword       # BM25 only\nmfs search \"auth\" --all --top-k 20                 # across all indexed folders\nmfs search \"auth\" ./src/ --json                    # scoped + JSON for parsing\ngit log --oneline | mfs search \"fix auth\"          # temporary dense search over stdin\n\nmfs grep \"ERR_TOKEN_EXPIRED\" .                     # Milvus BM25 / system grep\nmfs grep -C 5 \"OAuth\" ./docs/                      # context lines\nmfs grep \"def.*token\" ./src/                       # regex\n```\n\n### 📖 Browse — see what's there without reading everything\n\nFiles and directories come with natural structure — Markdown has headings,\nsource code has classes and functions, a directory has children and summaries.\nMFS exposes that structure at three **density presets**, with the same mental\nmodel on every file type:\n\n| Preset    | Cost     | What an agent sees                             | Answers                           |\n| --------- | -------- | ---------------------------------------------- | --------------------------------- |\n| `--peek`  | cheapest | structure only — headings, symbols, file names | *what is this thing?*             |\n| `--skim`  | medium   | structure + a short paragraph per node         | *what is each section about?*     |\n| `--deep`  | highest  | full expansion down the outline                | *I'm about to edit this; show me* |\n\n```bash\n# directories\nmfs tree --peek -L 2 ./docs/       # skeleton, two levels deep\nmfs ls --skim ./docs/              # every file with a one-paragraph summary\nmfs tree --deep ./docs/            # full expansion\n\n# single files\nmfs cat --peek ./docs/auth.md      # heading-only skeleton\nmfs cat --skim ./docs/auth.md      # headings + first paragraph of each\nmfs cat --deep ./docs/auth.md      # detailed expansion\nmfs cat -n 40:60 ./docs/auth.md    # drill in to a specific line range\nmfs cat --skim ./data/events.jsonl # compact rows for JSONL / JSON / CSV\n```\n\nAll three presets are driven by the same three knobs — `-W \u003cchars\u003e` (per-node\nwidth), `-H \u003cn\u003e` (how many top-level nodes), `-D \u003cn\u003e` (depth). Custom budgets\nwork anywhere: `mfs cat -W 80 -H 5 -D 2 auth.md`.\n\nPDF and DOCX files are converted to Markdown before indexing or browsing.\nJSON, JSONL and CSV are intentionally not embedded by default; `mfs grep`\nstill searches them, and `mfs cat --peek/--skim/--deep` gives compact\nstructured views.\n\nThe point of browse: an agent should *not* have to choose between \"read the\nwhole file\" (expensive) and \"stare at a single search chunk\" (no surrounding\ncontext). Spending a few hundred tokens on `--peek` of a whole directory is\nenough to know what lives there — and cheap enough to catch things search\nmight have missed.\n\n### 🤝 Search × Browse — the two-leg workflow\n\nSearch is flat; browse is hierarchical. A typical agent pass alternates them:\n\n```bash\n# 1. orient — peek the whole repo into a few hundred tokens\nmfs tree --peek -L 2 .\n\n# 2. locate — flat hybrid search (scope = cwd, or --all for everything)\nmfs search \"how is session state stored\" . --top-k 5\n\n# 3. contextualize — skim the candidate file so the hit has surroundings\nmfs cat --skim ./src/session/store.py\n\n# 4. drill in — read the exact lines before editing\nmfs cat -n 80:140 ./src/session/store.py\n```\n\nBrowse doubles as a cheap safety net for search: a `--peek` over a\nneighbouring directory often surfaces a relevant file that didn't match the\nsearch query's wording.\n\n### 📦 Structured output — the `--json` envelope\n\nEvery command (`search`, `grep`, `ls`, `tree`, `cat`) accepts `--json` and emits\nthe same **Hit envelope** — `{source, lines, content, score, metadata}`.\n`metadata.kind` tells the caller which command produced the hit, so one parser\nhandles all five.\n\n```bash\n$ mfs search \"oauth flow\" . --json\n[\n  {\n    \"source\": \"/repo/src/auth/oauth.py\",\n    \"lines\": [42, 98],\n    \"content\": \"class OAuthClient:\\n    ...\",\n    \"score\": 0.87,\n    \"metadata\": {\n      \"kind\": \"search\",\n      \"content_type\": \"code\",\n      \"is_dir\": false,\n      \"chunk_index\": 3,\n      \"language\": \"python\",\n      \"symbol_name\": \"OAuthClient\"\n    }\n  }\n]\n```\n\n---\n\n## Optional: LLM summaries \u0026 VLM image descriptions\n\nBoth are **opt-in and off by default** — MFS's chunking and embedding pipeline\nnever calls an LLM unless you ask it to. Flip them on when vague queries miss\nthe right files, or when you want image assets to be searchable.\n\n### Summaries sharpen recall on vague queries\n\nText files (Markdown, code, PDFs/DOCX converted to Markdown) can carry an\nauto-generated LLM summary that's embedded **alongside** the body chunks in\nthe same collection. The summary participates in the same hybrid retrieval,\nso a vague query like *\"how does the new onboarding flow work\"* can hit the\nsummary even when no single body chunk matches the wording. When a summary\nwins, the result's header picks up a `[summary]` marker so the caller can\ntell it apart from body chunks.\n\n```bash\nmfs add ./docs/ --summarize                                     # auto-generate via the configured [llm]\n```\n\nSummaries are **stale-tracked**: when a summarized file is re-indexed after\nedits, its summary is marked stale but kept around until you regenerate it.\n`mfs status --needs-summary` lists what still needs a fresh pass.\n\n### Image descriptions make binary assets searchable\n\nThere is **no direct image embedding** (no CLIP-style multimodal encoder).\nInstead the path is **image → VLM text description → text embedder**, so the\nimage shows up as a normal search hit with `content_type: vlm_description` in\nthe JSON envelope. Works for PNG / JPG / WEBP / GIF / BMP.\n\n```bash\nmfs add ./assets/ --describe                                    # auto-generate via a VLM-capable provider\n```\n\n### Providers\n\n| Role                | Providers implemented                                        |\n| ------------------- | ------------------------------------------------------------ |\n| Text summaries      | openai, anthropic, google, ollama, mistral                   |\n| VLM image descriptions | openai (gpt-4o / gpt-4o-mini / gpt-4-turbo), anthropic, google |\n\nInstall with `uv sync --extra llm-\u003cname\u003e` and configure in `~/.mfs/config.toml`:\n\n```toml\n[llm]\nprovider = \"openai\"\nmodel    = \"gpt-4o-mini\"     # must be a vision model if you use --describe\n```\n\nPointing `--describe` at a text-only provider (ollama / mistral) exits with an\nerror rather than silently skipping the image.\n\n---\n\n## Architecture\n\n```\n┌──────────────────────────────────────────────────────────────┐\n│                       mfs \u003ccmd\u003e                              │\n│   add · search · grep · ls · tree · cat · status · remove   │\n└────────────────┬───────────────────────────┬─────────────────┘\n                 │                           │\n        ┌────────┴────────┐         ┌────────┴────────┐\n        │   Ingest        │         │   Retrieve      │\n        │                 │         │                 │\n        │  scan           │         │  hybrid search  │\n        │   ↓             │         │  (dense + BM25  │\n        │  chunk          │         │   + RRF)        │\n        │  (tree-sitter,  │         │   ↓             │\n        │   markdown,     │         │  density render │\n        │   pymupdf4llm)  │         │  (peek/skim/    │\n        │   ↓             │         │   deep)         │\n        │  embed          │         │                 │\n        └────────┬────────┘         └────────┬────────┘\n                 │                           │\n                 ▼                           ▲\n     ┌────────────────────────────────────────────────┐\n     │   Milvus   (Lite · Self-hosted · Zilliz Cloud) │\n     │   dense vectors · BM25 sparse · metadata       │\n     └────────────────────────────────────────────────┘\n                           ▲\n                           │ derived index\n                           │\n     ┌────────────────────────────────────────────────┐\n     │   Your files  (source of truth; read-only)     │\n     │   state → ~/.mfs/ only; project dir untouched  │\n     └────────────────────────────────────────────────┘\n```\n\nIncremental sync is automatic: each `mfs add .` hashes files and only re-embeds\nwhat changed. The Milvus collection is a rebuildable cache — delete `~/.mfs/`\nand a fresh `mfs add .` reconstructs everything from the files on disk.\n\n---\n\n## Configuration\n\nConfig lives at `~/.mfs/config.toml`. Nothing is required — defaults pick OpenAI\nembeddings and Milvus Lite. Minimal example:\n\n```toml\n[embedding]\nprovider = \"openai\"                          # openai | onnx | google | voyage | jina | mistral | ollama | local\nmodel    = \"text-embedding-3-small\"\n\n[llm]\nprovider = \"openai\"\nmodel    = \"gpt-4o-mini\"\n\n[milvus]\n# uri = \"~/.mfs/milvus.db\"                   # default: Milvus Lite, embedded\n# uri = \"http://localhost:19530\"             # self-hosted Milvus\n# uri = \"https://xxx.zillizcloud.com\"        # Zilliz Cloud\n# token = \"...\"\n```\n\nUse `mfs config show` to inspect effective config and `mfs config set \u003ckey\u003e\n\u003cvalue\u003e` to edit it from the CLI.\n\nConverted PDF/DOCX Markdown is cached under `~/.mfs/converted/`. The cache is\nbounded by `[cache].max_size_mb` and evicts the least recently used converted\nfiles when it grows past the configured cap.\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eMilvus backends\u003c/b\u003e\u003c/summary\u003e\n\n| Backend            | URI                                  | Notes                                |\n| ------------------ | ------------------------------------ | ------------------------------------ |\n| Milvus Lite        | `~/.mfs/milvus.db`                   | Default. Zero config. Single writer. |\n| Self-hosted Milvus | `http://localhost:19530`             | Concurrent writers, full BM25.       |\n| Zilliz Cloud       | `https://*.zillizcloud.com` + token  | Managed. Full BM25.                  |\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003eEmbedding providers\u003c/b\u003e\u003c/summary\u003e\n\n| Provider | Example model                    | Dim    |\n| -------- | -------------------------------- | ------ |\n| openai   | `text-embedding-3-small`         | 1536   |\n| onnx     | `gpahal/bge-m3-onnx-int8`        | 1024   |\n| google   | `gemini-embedding-001`           | 768    |\n| voyage   | `voyage-3-lite`                  | 512    |\n| jina     | `jina-embeddings-v3`             | 1024   |\n| ollama   | `bge-m3`, `nomic-embed-text`, …  | varies |\n\n\u003c/details\u003e\n\nMFS reads `.gitignore` automatically and picks up a `.mfsignore` at the project\nroot (same syntax) — use either to exclude paths from indexing.\n\n---\n\n## Development\n\n```bash\nuv sync                          # install dev dependencies\nuv run pytest tests/ -v          # run the test suite\nuv run ruff check src/ tests/    # lint\n```\n\nThe codebase lives under `src/mfs/`:\n\n- `cli.py` — Click entry point\n- `ingest/` — scanner, chunker (incl. tree-sitter AST), PDF/DOCX converter, worker\n- `embedder/` — embedding providers (OpenAI, ONNX, Gemini, Voyage, Jina, Ollama, …)\n- `llm/` — LLM / VLM providers for opt-in enrichment\n- `search/` — search, grep, summary, density presets\n- `output/` — display, pipe handshake, JSON schema\n- `store.py` — Milvus collection wrapper\n\n---\n\n## Roadmap\n\n- [ ] Rewrite low-level file processing modules in Rust for faster scanning,\n      parsing, and format handling.\n- [ ] Support more file formats and more multimodal content types.\n- [ ] Run broader evaluations across more agent workflows, corpora, and\n      real-world scenarios.\n\n---\n\n## Acknowledgements\n\nMFS is shaped by several related projects and communities:\n\n- [VKFS](https://github.com/ZeroZ-lab/vkfs), built by a partner team, explores a\n  similar Unix-like interface for agent access to vector-backed knowledge.\n- [claude-context](https://github.com/zilliztech/claude-context) and\n  [memsearch](https://github.com/zilliztech/memsearch), earlier Zilliz projects,\n  provided practical lessons from community feedback on code search, memory\n  search, synchronization, and agent-facing architecture.\n\n---\n\n## License\n\nApache License 2.0. See [LICENSE](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzilliztech%2Fmfs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzilliztech%2Fmfs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzilliztech%2Fmfs/lists"}