{"id":50485801,"url":"https://github.com/timorunge/lore","last_synced_at":"2026-06-01T22:02:35.210Z","repository":{"id":357447934,"uuid":"1229754911","full_name":"timorunge/lore","owner":"timorunge","description":"Local knowledge base for 90+ document formats. CLI search + MCP server. No cloud, no API keys.","archived":false,"fork":false,"pushed_at":"2026-05-12T19:54:37.000Z","size":552,"stargazers_count":13,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-05-12T21:26:20.024Z","etag":null,"topics":["cli","full-text-search","knowledge-base","local-first","mcp","rust"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/timorunge.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"docs/contributing.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":".github/SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-05-05T10:58:29.000Z","updated_at":"2026-05-12T19:54:57.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/timorunge/lore","commit_stats":null,"previous_names":["timorunge/lore"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/timorunge/lore","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timorunge%2Flore","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timorunge%2Flore/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timorunge%2Flore/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timorunge%2Flore/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/timorunge","download_url":"https://codeload.github.com/timorunge/lore/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timorunge%2Flore/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33795114,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-01T02:00:06.963Z","response_time":115,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cli","full-text-search","knowledge-base","local-first","mcp","rust"],"created_at":"2026-06-01T22:02:33.567Z","updated_at":"2026-06-01T22:02:35.202Z","avatar_url":"https://github.com/timorunge.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# lore\n\nA local knowledge base you can search from the terminal or serve to\nagents over MCP. lore ingests your documents -- local files, websites,\ngit repos, feeds, S3, YouTube, email, shell commands, upstream MCP\nservers, 90+ formats -- and indexes everything with full-text search.\nNo external services, no infrastructure to manage. One binary, one\nconfig, one store -- or as many as you need.\n\nThink `man` pages for your projects -- for humans and agents alike.\n\n## Install\n\nInstall script (Linux and macOS):\n\n```bash\ncurl -fsSL https://github.com/timorunge/lore/releases/latest/download/lore-cli-installer.sh | bash\n```\n\nHomebrew (macOS and Linux):\n\n```bash\nbrew install timorunge/tap/lore-cli\n```\n\nOr download a pre-built binary directly from the\n[releases page](https://github.com/timorunge/lore/releases).\n\n\u003e **Windows:** Pre-built binaries are available for Linux and macOS.\n\u003e On Windows, use [WSL](https://learn.microsoft.com/en-us/windows/wsl/)\n\u003e (recommended) or build from source with `cargo install` (OCR requires\n\u003e cmake and is not supported on MSVC -- omit with `--no-default-features\n\u003e --features ingest,mcp`).\n\nBuild from source:\n\n```bash\ngit clone https://github.com/timorunge/lore.git\ncd lore\ncargo install --path .\n```\n\nOptional feature flags:\n\n\u003c!-- BEGIN GENERATED: compile-features --\u003e\n| Flag | Default | Description |\n|------|---------|-------------|\n| `ocr` | on | OCR for scanned PDFs and images (requires cmake) |\n| `llm` | off | LLM enrichment (`lore ingest`, `lore enrich`): Ollama, Anthropic, OpenAI, and Bedrock |\n| `s3` | off | Amazon S3 source support |\n| `mcp` | on | Upstream MCP server ingestion (resources and tool calls) |\n| `iwork` | off | Apple iWork documents (Keynote, Pages, Numbers) |\n| `tree-sitter` | off | Source code parsing via tree-sitter |\n\u003c!-- END GENERATED: compile-features --\u003e\n\n```bash\ncargo install --path . --features llm         # with LLM enrichment\ncargo install --path . --features llm,s3      # with LLM + S3 support\ncargo install --path . --all-features         # all optional features\ncargo install --path . --no-default-features  # minimal build (no OCR, no MCP)\n\n# Windows (native, no OCR):\ncargo install --path . --no-default-features --features ingest,mcp\n```\n\n## Quick start\n\n```bash\ncd my-project\nlore init                     # generates .lore/lore.yaml from your project\nlore ingest                   # build the store\nlore search \"authentication\"  # search from the terminal\nlore serve                    # or serve to agents over MCP\n```\n\n`lore init` scans your directory, detects documentation folders and\nREADME files, and generates a config with pre-filled sources. Review it,\ningest, and search -- from your terminal or through any MCP client.\n\n### Try it without your own docs\n\nIf you cloned the repo, the gutenberg example works out of the box:\n\n```bash\ncd examples/gutenberg\nlore ingest\nlore search \"love and death\"\n```\n\nWithout a checkout, fetch the config directly:\n\n```bash\nmkdir tryout \u0026\u0026 cd tryout \u0026\u0026 mkdir -p .lore\ncurl -fsSL https://raw.githubusercontent.com/timorunge/lore/main/examples/gutenberg/lore.yaml \\\n  -o .lore/lore.yaml\nlore ingest\nlore search \"love and death\"\n```\n\nThis fetches classic novels from Project Gutenberg and indexes them --\nno local documents required. See [examples/](examples/) for more\nready-to-run configs.\n\n## Use cases\n\n**Give your AI assistant project context** -- index your READMEs, design\ndocs, ADRs, and runbooks. Connect via MCP. Your AI now knows your\nproject without stuffing everything into the prompt.\n\n```yaml\nsources:\n  - path: docs/\n    topic: Internal\n```\n\n**Offline documentation search** -- index vendor docs, API references,\nor internal wikis. Search from the terminal with no browser and no\ninternet.\n\n```yaml\nsources:\n  - sitemap: https://docs.example.com/sitemap.xml\n    topic: API Reference\n```\n\n**Team knowledge base** -- point lore at a shared git repo of documents.\nEveryone runs `lore ingest` locally. No server, no SaaS, no vendor\nlock-in.\n\n```yaml\nsources:\n  - git: https://github.com/org/team-docs\n    glob: \"**/*.md\"\n    topic: Team\n```\n\n**Federated search** -- keep knowledge bases separate but query them\ntogether. Each config has its own store and update schedule; lore merges\nresults at query time.\n\n```bash\nlore search \"auth\" -c project.yaml -c vendor.yaml\n```\n\n## MCP integration\n\nServe your knowledge base to AI assistants over the\n[Model Context Protocol](https://modelcontextprotocol.io/).\n\nStdio (Claude Code, Kiro, Cursor -- any MCP client that launches\nsubprocesses):\n\n```json\n{\n  \"mcpServers\": {\n    \"my-docs\": {\n      \"command\": \"lore\",\n      \"args\": [\"serve\"],\n      \"env\": {\n        \"LORE_CONFIG\": \"/path/to/.lore/lore.yaml\"\n      }\n    }\n  }\n}\n```\n\nStreamable HTTP (web apps, remote agents, multiple clients):\n\n```bash\nlore serve --transport http --port 8080\n```\n\n```json\n{\n  \"mcpServers\": {\n    \"my-docs\": {\n      \"url\": \"http://localhost:8080/mcp\"\n    }\n  }\n}\n```\n\nSix read-only tools: `lore_info`, `lore_list_topics`, `lore_search`,\n`lore_read_topic`, `lore_list_docs`, and `lore_read_doc`.\n\nMCP resources for clients that support browsing: `lore://info`,\nplus one `lore://topics/{name}` and one `lore://docs/{source}` resource\nper topic and document. All discoverable via `list_resources`.\n\nServe multiple knowledge bases through a single MCP server -- pass\nmultiple `-c` flags and lore federates them at query time:\n\n```bash\nlore serve -c project.yaml -c vendor.yaml\n```\n\n**`/lore` skill** -- lore ships a `/lore` skill in\n[skills/lore/](skills/lore/) for AI coding assistants that support\nskills. Type `/lore` for interactive help with setup, config authoring,\nand troubleshooting.\n\nSee [MCP Integration](docs/mcp-integration.md) for transport options,\nmulti-KB setups, and the full tool reference.\n\n## Configuration\n\nEach knowledge base is defined by a YAML config file. Mix source types\nand assign topics for independent search:\n\n```yaml\nname: my-knowledge-base\n\nbase_dir: ..\n\nsources:\n  # All source keys accept a single string or a list of strings.\n  - path: ./docs\n    glob: \"**/*.md\"\n    topic: Internal\n\n  - git: https://github.com/org/public-docs\n    glob: \"**/*.md\"\n    topic: Public Docs\n\n  - sitemap: https://docs.example.com/sitemap.xml\n    include: \"/reference/\"\n    topic: API Reference\n\n  - youtube: https://www.youtube.com/watch?v=JZfJTSlhOXM\n    lang: en\n    topic: Talks\n```\n\nThe YouTube source spawns [yt-dlp](https://github.com/yt-dlp/yt-dlp)\nas a subprocess to fetch transcripts -- `yt-dlp` must be on `PATH`.\nPlaylists and channels are also supported.\n\nURL-based sources support custom HTTP headers for authenticated\nendpoints. Header values support `${LORE_*}` environment variable\nexpansion (only variables with the `LORE_` prefix are expanded):\n\n```yaml\n  - sitemap: https://internal.corp/sitemap.xml\n    headers:\n      Authorization: \"Bearer ${LORE_DOCS_TOKEN}\"\n```\n\nProcessing profiles let you tune chunking and metadata extraction\nper source. Define named presets and reference them inline, or override\ninline directly:\n\n```yaml\nbase_dir: ..\n\nsources:\n  - path: docs/                    # uses global defaults\n  - path: src/\n    processing: code               # uses code preset\n  - path: special/\n    processing:                    # inline override\n      max_chunk_chars: 3000\n\nprocessing:\n  max_chunk_chars: 1600\n  presets:\n    code:\n      extract: none\n      max_chunk_chars: 800\n```\n\nSee the [examples/](examples/) directory for complete, runnable configs\nand the [Configuration Reference](docs/configuration.md) for all\noptions.\n\n## How it works\n\n```\nsources                    lore                         consumers\n  |                        |                              |\n  |   local, URL, git,     |                              |\n  |   sitemap, feed, S3,   |                              |\n  |   YouTube, maildir,    |                              |\n  |   exec, MCP            |                              |\n  |-----------------------\u003e|                              |\n  |                        |  extract text (kreuzberg)    |\n  |                        |  extract metadata            |\n  |                        |  apply pipeline transforms   |\n  |                        |  chunk by structure          |\n  |                        |  index (tantivy)             |\n  |                        |                              |\n  |                        |  lore search (terminal)      |\n  |                        |\u003c----- you ----- or ---------\u003e|\n  |                        |  lore serve (stdio / http)   |\n  |                        |\u003c-----------------------------|\n  |                        |  search / browse / retrieve  |\n  |                        |-----------------------------\u003e|\n```\n\nDocuments flow left to right during `lore ingest`. At query time, you\nsearch from the terminal with `lore search`, or agents query through\nsix read-only MCP tools via `lore serve`. The store lives in\n`.lore/store` by default (configurable via `store.path` or\n`LORE_STORE_PATH`) -- no external services.\n\n### What lore does\n\n- **10 source types** -- local files, URLs, git repos, sitemaps, RSS/Atom\n  feeds, S3, YouTube (via [yt-dlp](https://github.com/yt-dlp/yt-dlp)),\n  maildir, shell commands, and upstream MCP servers\n- **90+ document formats** -- PDF, DOCX, XLSX, HTML, email, EPUB,\n  archives, Markdown, Org-mode, LaTeX, source code, and\n  [more](docs/formats.md) via\n  [kreuzberg](https://github.com/kreuzberg-dev/kreuzberg)\n- **Metadata extraction** -- title, author, language, created date,\n  topic, and tags from binary metadata, frontmatter, Org-mode headers,\n  and content heuristics; choose\n  `auto`, `builtin`, `kreuzberg`, or `none` per source\n- **Smart chunking** -- respects document structure (headings, sections);\n  configurable size limits\n- **Processing profiles** -- per-source presets and inline overrides for\n  chunk size, metadata extraction, and custom transform pipelines\n- **Incremental updates** -- only new or modified documents are\n  re-processed; periodic commits let you resume cancelled ingests\n- **Embedded index** -- full-text search via\n  [Tantivy](https://github.com/quickwit-oss/tantivy), stored locally,\n  no external database\n\n## Search from the terminal\n\nQuery your knowledge base without leaving the terminal:\n\n```bash\nlore search \"authentication\"         # free-text search\nlore search \"auth\" --topic Security  # filter by topic\nlore topics                          # list all topics\nlore docs                            # list all documents\nlore read docs/auth.md               # read a document\nlore read docs/auth.md --full        # read as continuous text\nlore status                          # show what changed since last ingest\nlore completions zsh                 # generate shell completions\n```\n\nQuery and listing commands support `--json` for piping to `jq` and other tools.\nPass multiple `-c` flags for federated search across knowledge bases. See\n[CLI Reference](docs/cli.md) for the full flag reference.\n\n## Documentation\n\n| | |\n|---|---|\n| [Hands-on Guide](docs/hands-on-guide.md) | Step-by-step from install to serving |\n| [CLI Reference](docs/cli.md) | All commands and flags, including search, topics, docs, read |\n| [Configuration](docs/configuration.md) | Config file format, source types, env vars |\n| [Supported Formats](docs/formats.md) | Document formats and metadata extraction |\n| [MCP Integration](docs/mcp-integration.md) | Server setup, tool reference, multi-KB configs |\n| [Architecture](docs/architecture.md) | Ingest pipeline, store design, key decisions |\n| [Design Philosophy](docs/design-philosophy.md) | Why BM25, why single binary, two interfaces (CLI and MCP) |\n| [Performance](docs/performance.md) | Benchmarks, tuning tips, index size guidance |\n| [Security](docs/security.md) | Threat model, SSRF protection, input limits |\n| [Contributing](docs/contributing.md) | Dev setup, quality gates, extending lore |\n| [Examples](examples/) | Runnable configs for common use cases |\n\n## Data responsibility\n\nYou are responsible for ensuring you have the right to ingest, index,\nand serve the content you configure. When LLM enrichment is enabled,\ndocument content is sent to the configured provider -- for sensitive\ndata, consider a local provider like Ollama. lore does not phone home\nor collect telemetry. See [Security](docs/security.md#data-responsibility)\nfor details.\n\n## License\n\nMIT -- see [LICENSE](LICENSE).\n\n\u003cdetails\u003e\n\u003csummary\u003eThe Lore of Lore\u003c/summary\u003e\n\nEvery tool needs a mass-appealing origin story these days, so here is ours.\n\nlore is an old lady with tentacles. She hoards knowledge like a kraken hoards\nshipwrecks -- reaching into your PDFs, your contracts, your dusty email\narchives, pulling out what matters, and indexing it before you even knew you\nneeded it. She reads everything you give her -- twelve nines of\nreliability, one nine more than Amazon promises for not losing your\nfiles. She does not know what downtime means. She claims she was here\nbefore the word was invented -- her git log disagrees. She is part librarian,\npart eldritch horror, part *folklore* -- an oral tradition for the digital\nage, except she actually remembers things correctly -- until you\n`lore ingest --recreate` and wipe her memory clean. She does not hold\ngrudges. She just re-reads everything -- and fast. You will see progress\nbars flicker and vanish before you can read their labels, documents\nstreaming past like debris in a whirlpool. By the time you glance back\nat the terminal, she is already done and waiting for your next question.\nOld lady with tentacles, remember? Parallel I/O.\n\nShe is not a fancy AI. She does not hallucinate your documents. She was,\nhowever, largely written by one -- but that is a family secret. She uses\nBM25 like it is 1994 because it works and because she does not need a GPU\nto tell you where you wrote down that one authentication thing three months\nago -- and to quietly surface a memo you never opened that would have saved\nyou the trouble. She is a single binary -- all tentacles in one body,\nhunting alone in the deep. No cloud above her, the sun does not reach\nthat far down. No dependencies, no entourage, no committee. Just her and\nstrong opinions about how to slice a shipwreck into readable pieces.\n\nWill she change your life? No. If your problem is 200 Markdown files,\nuse ripgrep -- you are fine. But when it is PDFs, Word documents,\nscanned invoices, HTML exports, and that one spreadsheet Karen from\naccounting sent as a .zip and you are too tired to care which tool opens\nwhat -- that is where lore lives. She will find that stupid Q3 number at\n11 PM so you do not have to open Excel. As for Karen -- lore says she\nwas \"processed.\" We do not ask follow-up questions.\n\nSome say she keeps more than one lair. One for the things you wrote,\none for the things sent to you, one for the things you inherited and\nnever dared open -- each hoard sealed off, each minding its own\nbusiness. They do not gossip. But she remembers every single one, and\nif you ask the right question she will search them all at once -- one\nquestion, every lair, answers rising from the deep already sorted by\nwhich ones matter most. She calls it federation. We call it unsettling.\n\nGive her your documents. She loves reading them. She loves reading them \n*to* you even more -- come in, sit down, have a cookie -- yes, granny's \ncookies. She baked them. Do not ask what is in them.\n\nYou already found her.\n\n\u003c/details\u003e\n\nShe is still reading.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftimorunge%2Flore","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftimorunge%2Flore","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftimorunge%2Flore/lists"}