https://github.com/cuzfrog/docent-mcp
Semantic + BM25 Document + Git history search MCP server
https://github.com/cuzfrog/docent-mcp
ai bm25 decision-records embedding-models git-history hybrid-search indexing local-ai mcp mcp-server search-engine
Last synced: 1 day ago
JSON representation
Semantic + BM25 Document + Git history search MCP server
- Host: GitHub
- URL: https://github.com/cuzfrog/docent-mcp
- Owner: cuzfrog
- License: mit
- Created: 2026-05-05T02:15:17.000Z (30 days ago)
- Default Branch: main
- Last Pushed: 2026-05-19T04:17:00.000Z (16 days ago)
- Last Synced: 2026-05-19T06:56:33.904Z (15 days ago)
- Topics: ai, bm25, decision-records, embedding-models, git-history, hybrid-search, indexing, local-ai, mcp, mcp-server, search-engine
- Language: Rust
- Homepage:
- Size: 691 KB
- Stars: 1
- Watchers: 0
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://safeskill.dev/scan/cuzfrog-docent-mcp)
# docent
**Semantic + BM25 Document & Git history search for Design Decision Records** — an experimental MCP server written in Rust that indexes documents and git history, letting agents query *why* code looks the way it does.
```
files/git ──▼── index ──▶ MCP server ◀──── query
(cache) (HTTP)
```
## Quick Start
```sh
docent init # generate docent.toml config
docent index [path] # index .md files + git history
docent serve # start MCP server on port 7878
```
Open [http://localhost:7878](http://localhost:7878) for the built-in Web UI.
## Usage
| Command | Description |
|---|---|
| `docent init` | Generate a `docent.toml` config file |
| `docent index [dir]` | Index both file and git sources (default: current dir) |
| `docent index-file ` | Index specific files/directories |
| `docent index-git ` | Index git history from a repository |
| `docent serve` | Start the MCP server (streamable HTTP) |
| `docent list-models` | List supported embedding models |
Flags: `--config ` (default `./docent.toml`), `--rebuild` (full re-index), `--verbose`.
## How It Works
1. **Sources** — Reads markdown files and git commit history
2. **Section-aware chunking** — Splits documents into chunks, preserving heading structure
3. **Embedding** — Converts chunks to vectors via `fastembed` (configurable model)
4. **Index cache** — Persists vectors and metadata to disk
5. **Semantic + BM25 search** — Hybrid scoring with configurable algorithm
6. **MCP server** — Exposes `search_ddr` tool over streamable HTTP
## Install
TBC
## Documentation
- [Development Guide](doc/Development.md)