{"id":51025266,"url":"https://github.com/danieljustus/symaira-seek","last_synced_at":"2026-06-21T19:01:13.797Z","repository":{"id":365249441,"uuid":"1257556814","full_name":"danieljustus/symaira-seek","owner":"danieljustus","description":"Local-first, CGO-free document retrieval for AI agents with hybrid BM25+vector search","archived":false,"fork":false,"pushed_at":"2026-06-16T14:41:04.000Z","size":17521,"stargazers_count":0,"open_issues_count":3,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-16T15:20:33.695Z","etag":null,"topics":["ai-agents","document-retrieval","fts5","go","mcp","semantic-search","sqlite"],"latest_commit_sha":null,"homepage":"https://symaira.com","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/danieljustus.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":".github/SECURITY.md","support":".github/SUPPORT.md","governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-06-02T19:44:46.000Z","updated_at":"2026-06-16T14:49:48.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/danieljustus/symaira-seek","commit_stats":null,"previous_names":["danieljustus/symaira-seek"],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/danieljustus/symaira-seek","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieljustus%2Fsymaira-seek","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieljustus%2Fsymaira-seek/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieljustus%2Fsymaira-seek/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieljustus%2Fsymaira-seek/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/danieljustus","download_url":"https://codeload.github.com/danieljustus/symaira-seek/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieljustus%2Fsymaira-seek/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34622271,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-21T02:00:05.568Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agents","document-retrieval","fts5","go","mcp","semantic-search","sqlite"],"created_at":"2026-06-21T19:01:12.917Z","updated_at":"2026-06-21T19:01:13.792Z","avatar_url":"https://github.com/danieljustus.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Symaira-Seek\n\n\u003e Local-first, CGO-free document retrieval for AI agents with hybrid BM25+vector search.\n\n[![CI](https://github.com/danieljustus/symaira-seek/actions/workflows/ci.yml/badge.svg)](https://github.com/danieljustus/symaira-seek/actions/workflows/ci.yml)\n[![Go Version](https://img.shields.io/badge/go-1.26+-00ADD8?logo=go\u0026logoColor=white)](https://go.dev/)\n[![License: MIT](https://img.shields.io/badge/license-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\nSymaira-Seek is a local-first, CGO-free document retrieval tool designed for AI agents and developers. It provides hybrid search (BM25 keyword search combined with vector semantic search) and fuses results using Reciprocal Rank Fusion (RRF).\n\n## Why Symaira-Seek?\n\n- **100% CGO-free**: Pure Go SQLite driver (`modernc.org/sqlite`) — cross-compile anywhere without C dependencies\n- **Hybrid search**: Combines BM25 keyword matching with vector semantic search for better relevance\n- **Dual embedding modes**: Local Ollama integration for quality, deterministic fallback for offline usage\n- **Multiple interfaces**: CLI, MCP server for AI agents, and HTTP REST daemon\n- **Local-first**: Your data stays on your machine — no cloud dependencies required\n\nIt exposes multiple interfaces:\n1. **Command Line Interface (CLI)**: A Unix-friendly command utility.\n2. **Model Context Protocol (MCP)**: Native stdio-based tool integration for AI agents (Claude, Cursor, ChatGPT, etc.).\n3. **HTTP REST Daemon**: A lightweight localhost API with search and index endpoints.\n\n## Tech Stack \u0026 Architecture\n\n- **Language**: Pure Go (1.26+)\n- **Database**: SQLite (via pure-Go `modernc.org/sqlite` to maintain **100% CGO-free compilation**)\n- **Keyword Search**: SQLite FTS5 with BM25 ranking\n- **Vector Search**: Cosine similarity calculations on normalized 768-dimensional float32 arrays\n- **Result Fusion**: Reciprocal Rank Fusion (RRF) with parameter $k=60$\n- **Embeddings**: Dual-mode generation:\n  - **Local Ollama Integration**: Uses the `nomic-embed-text` model.\n  - **Deterministic Local Fallback**: Fallback word-hash vector generator to allow 100% offline usage.\n\n---\n\n## Installation \u0026 Setup\n\n### Homebrew (macOS/Linux)\n\nInstall via Homebrew tap:\n\n```bash\nbrew tap danieljustus/tap\nbrew install symseek\n```\n\n### Pre-built Binaries (Recommended)\n\nDownload the latest release for your platform from [GitHub Releases](https://github.com/danieljustus/symaira-seek/releases):\n\n- **Linux**: `symaira-seek_Linux_x86_64.tar.gz` or `symaira-seek_Linux_arm64.tar.gz`\n- **macOS**: `symaira-seek_Darwin_x86_64.tar.gz` or `symaira-seek_Darwin_arm64.tar.gz`\n- **Windows**: `symaira-seek_Windows_x86_64.zip` or `symaira-seek_Windows_arm64.zip`\n\nExtract and install:\n```bash\n# Linux/macOS\ntar -xzf symaira-seek_*.tar.gz\nchmod +x symseek\nsudo mv symseek /usr/local/bin/\n\n# Windows\n# Extract the .zip and add to PATH\n```\n\nVerify the installation:\n```bash\nsymseek version\n```\n\n### Build from Source\n\nEnsure you have [Go](https://go.dev/) installed.\n\n```bash\ngo build -o symseek cmd/symseek/main.go\n```\n\nTo inject a version string at build time, set `main.version` via `-ldflags`. The CI workflow derives the value from the current git tag (or a `0.0.0-dev+\u003cshort-sha\u003e` fallback) and passes it automatically:\n```bash\nVERSION=\"0.2.0\"\ngo build -ldflags \"-s -w -X main.version=${VERSION}\" -o symseek cmd/symseek/main.go\n./symseek version\n```\n\n### Run Tests\n```bash\ngo test -v ./...\n```\n\n---\n\n## CLI Usage\n\n### Index a Directory\nCrawl and index all markdown, text, code, JSON, and yaml files inside a folder:\n```bash\n./symseek index /path/to/my-documents\n```\n\n#### Watch Daemon\nKeep the tool running in the background to automatically synchronize changes (creation, modification, and deletion of files) every 5 seconds:\n```bash\n./symseek index /path/to/my-documents --watch\n```\n\n### Search Documents\nPerform a hybrid semantic and keyword search:\n```bash\n./symseek search \"renewable energy optimization\" --limit 5\n```\n\nExport structured search results directly to JSON:\n```bash\n./symseek search \"renewable energy optimization\" --json\n```\n\n### Get Database Stats\n```bash\n./symseek status\n```\n\nExport the same stats as JSON for monitoring pipelines:\n```bash\n./symseek status --json\n```\n\n### Configuration\n\n`symseek` stores its configuration as TOML in `~/.config/symseek/config.toml` (overridable with `--config`). A legacy `config.json` from older versions is migrated to TOML automatically on first run (and via `./symseek migrate`).\n\nView the active configuration (the path is printed to stderr):\n```bash\n./symseek config\n```\n\nSet a value without editing the file by hand:\n```bash\n./symseek config --set-key ollama_url --set-value http://localhost:11434/api/embeddings\n./symseek config --set-key model --set-value mxbai-embed-large\n```\n\nThe file is rewritten with mode `0600` on every write. Supported keys:\n\n| Key | Description | Default |\n| --- | --- | --- |\n| `ollama_url` | Ollama embeddings endpoint URL | `http://localhost:11434/api/embeddings` |\n| `model` | Embedding model name | `nomic-embed-text` |\n| `timeout_seconds` | Per-request Ollama timeout (seconds) | `120` |\n| `retry_count` | Number of Ollama retries on failure | `2` |\n| `retry_backoff_ms` | Initial retry backoff (milliseconds) | `500` |\n| `index_cooldown_seconds` | Cooldown between `/index` requests on the HTTP daemon | `5` |\n\n---\n\n## MCP Server Integration\n\nTo use Symaira-Seek as an MCP tool for AI clients (like Claude Desktop or Cursor), register it in your client's configuration file:\n\n```json\n{\n  \"mcpServers\": {\n    \"symaira-seek\": {\n      \"command\": \"/absolute/path/to/symaira-seek/symseek\",\n      \"args\": [\"serve\"]\n    }\n  }\n}\n```\n\n### Exposed Tools\n1. `search_documents(query, limit)`: Hybrid search over all indexed files.\n2. `read_document(path)`: Retrieves the complete content of an indexed file.\n3. `list_documents(folder)`: Explorative folder and index structure scanning.\n4. `get_context(topic, max_chars)`: Aggregates relevant context blocks from multiple documents.\n5. `index_document(path)`: Manually indexes a local file or directory.\n6. `index_url(url)`: Indexes content from a URL.\n\n### Tool Examples\n\n#### search_documents\n```json\n{\n  \"query\": \"renewable energy optimization\",\n  \"limit\": 5\n}\n```\n\nReturns formatted search results with file paths, chunk indices, and RRF scores.\n\n#### read_document\n```json\n{\n  \"path\": \"/home/user/documents/report.md\"\n}\n```\n\nReturns the full text content of an indexed file.\n\n#### list_documents\n```json\n{\n  \"folder\": \"/home/user/documents\"\n}\n```\n\nLists all indexed documents, optionally filtered by folder prefix.\n\n#### get_context\n```json\n{\n  \"topic\": \"machine learning\",\n  \"max_chars\": 4000\n}\n```\n\nAggregates relevant context blocks from multiple documents for a given topic.\n\n#### index_document\n```json\n{\n  \"path\": \"/home/user/documents\"\n}\n```\n\nIndexes a local file or directory immediately.\n\n#### index_url\n```json\n{\n  \"url\": \"https://example.com/article\"\n}\n```\n\nFetches and indexes content from a URL.\n\n---\n\n## HTTP REST Daemon\n\nStart the REST API on port `8788`:\n```bash\n./symseek serve --port 8788\n```\n\n### Endpoints\n- **GET** `/health`: Check status (`{\"status\": \"ok\"}`).\n- **GET** `/status`: Returns document counts, chunk counts, and database file size.\n- **GET** `/search?q=query\u0026limit=5`: Query the hybrid search engine.\n- **POST** `/index` with body `{\"path\": \"/absolute/path\"}`: Synchronously crawl and index a folder.\n\n### Endpoint Examples\n\n#### Health Check\n```bash\ncurl http://localhost:8788/health\n# Response: {\"status\": \"ok\"}\n```\n\n#### Get Status\n```bash\ncurl http://localhost:8788/status\n# Response: {\"documents\": 10, \"chunks\": 50, \"db_size\": \"1.2 MB\"}\n```\n\n#### Search Documents\n```bash\ncurl \"http://localhost:8788/search?q=machine+learning\u0026limit=5\"\n# Response: Array of search results with file paths and scores\n```\n\n#### Index Documents\n```bash\ncurl -X POST http://localhost:8788/index \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"path\": \"/home/user/documents\"}'\n# Response: {\"status\": \"indexed\", \"path\": \"/home/user/documents\"}\n```\n\n---\n\n## License\n\nMIT License. Part of the Symaira tool suite.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanieljustus%2Fsymaira-seek","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdanieljustus%2Fsymaira-seek","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanieljustus%2Fsymaira-seek/lists"}