https://github.com/danieljustus/symaira-seek
Local-first, CGO-free document retrieval for AI agents with hybrid BM25+vector search
https://github.com/danieljustus/symaira-seek
ai-agents document-retrieval fts5 go mcp semantic-search sqlite
Last synced: 4 days ago
JSON representation
Local-first, CGO-free document retrieval for AI agents with hybrid BM25+vector search
- Host: GitHub
- URL: https://github.com/danieljustus/symaira-seek
- Owner: danieljustus
- License: mit
- Created: 2026-06-02T19:44:46.000Z (23 days ago)
- Default Branch: main
- Last Pushed: 2026-06-16T14:41:04.000Z (9 days ago)
- Last Synced: 2026-06-16T15:20:33.695Z (9 days ago)
- Topics: ai-agents, document-retrieval, fts5, go, mcp, semantic-search, sqlite
- Language: Go
- Homepage: https://symaira.com
- Size: 16.7 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Codeowners: .github/CODEOWNERS
- Security: .github/SECURITY.md
- Support: .github/SUPPORT.md
- Agents: AGENTS.md
Awesome Lists containing this project
README
# Symaira-Seek
> Local-first, CGO-free document retrieval for AI agents with hybrid BM25+vector search.
[](https://github.com/danieljustus/symaira-seek/actions/workflows/ci.yml)
[](https://go.dev/)
[](https://opensource.org/licenses/MIT)
Symaira-Seek is a local-first, CGO-free document retrieval tool designed for AI agents and developers. It provides hybrid search (BM25 keyword search combined with vector semantic search) and fuses results using Reciprocal Rank Fusion (RRF).
## Why Symaira-Seek?
- **100% CGO-free**: Pure Go SQLite driver (`modernc.org/sqlite`) — cross-compile anywhere without C dependencies
- **Hybrid search**: Combines BM25 keyword matching with vector semantic search for better relevance
- **Dual embedding modes**: Local Ollama integration for quality, deterministic fallback for offline usage
- **Multiple interfaces**: CLI, MCP server for AI agents, and HTTP REST daemon
- **Local-first**: Your data stays on your machine — no cloud dependencies required
It exposes multiple interfaces:
1. **Command Line Interface (CLI)**: A Unix-friendly command utility.
2. **Model Context Protocol (MCP)**: Native stdio-based tool integration for AI agents (Claude, Cursor, ChatGPT, etc.).
3. **HTTP REST Daemon**: A lightweight localhost API with search and index endpoints.
## Tech Stack & Architecture
- **Language**: Pure Go (1.26+)
- **Database**: SQLite (via pure-Go `modernc.org/sqlite` to maintain **100% CGO-free compilation**)
- **Keyword Search**: SQLite FTS5 with BM25 ranking
- **Vector Search**: Cosine similarity calculations on normalized 768-dimensional float32 arrays
- **Result Fusion**: Reciprocal Rank Fusion (RRF) with parameter $k=60$
- **Embeddings**: Dual-mode generation:
- **Local Ollama Integration**: Uses the `nomic-embed-text` model.
- **Deterministic Local Fallback**: Fallback word-hash vector generator to allow 100% offline usage.
---
## Installation & Setup
### Homebrew (macOS/Linux)
Install via Homebrew tap:
```bash
brew tap danieljustus/tap
brew install symseek
```
### Pre-built Binaries (Recommended)
Download the latest release for your platform from [GitHub Releases](https://github.com/danieljustus/symaira-seek/releases):
- **Linux**: `symaira-seek_Linux_x86_64.tar.gz` or `symaira-seek_Linux_arm64.tar.gz`
- **macOS**: `symaira-seek_Darwin_x86_64.tar.gz` or `symaira-seek_Darwin_arm64.tar.gz`
- **Windows**: `symaira-seek_Windows_x86_64.zip` or `symaira-seek_Windows_arm64.zip`
Extract and install:
```bash
# Linux/macOS
tar -xzf symaira-seek_*.tar.gz
chmod +x symseek
sudo mv symseek /usr/local/bin/
# Windows
# Extract the .zip and add to PATH
```
Verify the installation:
```bash
symseek version
```
### Build from Source
Ensure you have [Go](https://go.dev/) installed.
```bash
go build -o symseek cmd/symseek/main.go
```
To inject a version string at build time, set `main.version` via `-ldflags`. The CI workflow derives the value from the current git tag (or a `0.0.0-dev+` fallback) and passes it automatically:
```bash
VERSION="0.2.0"
go build -ldflags "-s -w -X main.version=${VERSION}" -o symseek cmd/symseek/main.go
./symseek version
```
### Run Tests
```bash
go test -v ./...
```
---
## CLI Usage
### Index a Directory
Crawl and index all markdown, text, code, JSON, and yaml files inside a folder:
```bash
./symseek index /path/to/my-documents
```
#### Watch Daemon
Keep the tool running in the background to automatically synchronize changes (creation, modification, and deletion of files) every 5 seconds:
```bash
./symseek index /path/to/my-documents --watch
```
### Search Documents
Perform a hybrid semantic and keyword search:
```bash
./symseek search "renewable energy optimization" --limit 5
```
Export structured search results directly to JSON:
```bash
./symseek search "renewable energy optimization" --json
```
### Get Database Stats
```bash
./symseek status
```
Export the same stats as JSON for monitoring pipelines:
```bash
./symseek status --json
```
### Configuration
`symseek` stores its configuration as TOML in `~/.config/symseek/config.toml` (overridable with `--config`). A legacy `config.json` from older versions is migrated to TOML automatically on first run (and via `./symseek migrate`).
View the active configuration (the path is printed to stderr):
```bash
./symseek config
```
Set a value without editing the file by hand:
```bash
./symseek config --set-key ollama_url --set-value http://localhost:11434/api/embeddings
./symseek config --set-key model --set-value mxbai-embed-large
```
The file is rewritten with mode `0600` on every write. Supported keys:
| Key | Description | Default |
| --- | --- | --- |
| `ollama_url` | Ollama embeddings endpoint URL | `http://localhost:11434/api/embeddings` |
| `model` | Embedding model name | `nomic-embed-text` |
| `timeout_seconds` | Per-request Ollama timeout (seconds) | `120` |
| `retry_count` | Number of Ollama retries on failure | `2` |
| `retry_backoff_ms` | Initial retry backoff (milliseconds) | `500` |
| `index_cooldown_seconds` | Cooldown between `/index` requests on the HTTP daemon | `5` |
---
## MCP Server Integration
To use Symaira-Seek as an MCP tool for AI clients (like Claude Desktop or Cursor), register it in your client's configuration file:
```json
{
"mcpServers": {
"symaira-seek": {
"command": "/absolute/path/to/symaira-seek/symseek",
"args": ["serve"]
}
}
}
```
### Exposed Tools
1. `search_documents(query, limit)`: Hybrid search over all indexed files.
2. `read_document(path)`: Retrieves the complete content of an indexed file.
3. `list_documents(folder)`: Explorative folder and index structure scanning.
4. `get_context(topic, max_chars)`: Aggregates relevant context blocks from multiple documents.
5. `index_document(path)`: Manually indexes a local file or directory.
6. `index_url(url)`: Indexes content from a URL.
### Tool Examples
#### search_documents
```json
{
"query": "renewable energy optimization",
"limit": 5
}
```
Returns formatted search results with file paths, chunk indices, and RRF scores.
#### read_document
```json
{
"path": "/home/user/documents/report.md"
}
```
Returns the full text content of an indexed file.
#### list_documents
```json
{
"folder": "/home/user/documents"
}
```
Lists all indexed documents, optionally filtered by folder prefix.
#### get_context
```json
{
"topic": "machine learning",
"max_chars": 4000
}
```
Aggregates relevant context blocks from multiple documents for a given topic.
#### index_document
```json
{
"path": "/home/user/documents"
}
```
Indexes a local file or directory immediately.
#### index_url
```json
{
"url": "https://example.com/article"
}
```
Fetches and indexes content from a URL.
---
## HTTP REST Daemon
Start the REST API on port `8788`:
```bash
./symseek serve --port 8788
```
### Endpoints
- **GET** `/health`: Check status (`{"status": "ok"}`).
- **GET** `/status`: Returns document counts, chunk counts, and database file size.
- **GET** `/search?q=query&limit=5`: Query the hybrid search engine.
- **POST** `/index` with body `{"path": "/absolute/path"}`: Synchronously crawl and index a folder.
### Endpoint Examples
#### Health Check
```bash
curl http://localhost:8788/health
# Response: {"status": "ok"}
```
#### Get Status
```bash
curl http://localhost:8788/status
# Response: {"documents": 10, "chunks": 50, "db_size": "1.2 MB"}
```
#### Search Documents
```bash
curl "http://localhost:8788/search?q=machine+learning&limit=5"
# Response: Array of search results with file paths and scores
```
#### Index Documents
```bash
curl -X POST http://localhost:8788/index \
-H "Content-Type: application/json" \
-d '{"path": "/home/user/documents"}'
# Response: {"status": "indexed", "path": "/home/user/documents"}
```
---
## License
MIT License. Part of the Symaira tool suite.