{"id":45339600,"url":"https://github.com/karmaniverous/jeeves-watcher","last_synced_at":"2026-04-15T07:08:12.911Z","repository":{"id":339604075,"uuid":"1162456995","full_name":"karmaniverous/jeeves-watcher","owner":"karmaniverous","description":"Filesystem watcher that keeps a Qdrant vector store in sync with document changes. Config-driven rules engine, semantic search API, and CLI.","archived":false,"fork":false,"pushed_at":"2026-02-26T16:21:37.000Z","size":3506,"stargazers_count":1,"open_issues_count":5,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-26T19:44:24.588Z","etag":null,"topics":["cli","document-indexing","embeddings","filesystem-watcher","gemini","langchain","qdrant","rag","semantic-search","typescript","vector-store"],"latest_commit_sha":null,"homepage":"https://docs.karmanivero.us/jeeves-watcher/","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/karmaniverous.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-20T09:35:53.000Z","updated_at":"2026-02-25T09:23:01.000Z","dependencies_parsed_at":"2026-02-24T13:00:35.662Z","dependency_job_id":null,"html_url":"https://github.com/karmaniverous/jeeves-watcher","commit_stats":null,"previous_names":["karmaniverous/jeeves-watcher"],"tags_count":33,"template":false,"template_full_name":"karmaniverous/npm-package-template-ts","purl":"pkg:github/karmaniverous/jeeves-watcher","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/karmaniverous%2Fjeeves-watcher","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/karmaniverous%2Fjeeves-watcher/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/karmaniverous%2Fjeeves-watcher/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/karmaniverous%2Fjeeves-watcher/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/karmaniverous","download_url":"https://codeload.github.com/karmaniverous/jeeves-watcher/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/karmaniverous%2Fjeeves-watcher/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29922734,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-27T19:37:42.220Z","status":"online","status_checked_at":"2026-02-28T02:00:07.010Z","response_time":90,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cli","document-indexing","embeddings","filesystem-watcher","gemini","langchain","qdrant","rag","semantic-search","typescript","vector-store"],"created_at":"2026-02-21T10:04:46.004Z","updated_at":"2026-04-01T21:52:02.011Z","avatar_url":"https://github.com/karmaniverous.png","language":"HTML","readme":"# Jeeves Watcher 🎩\n\nFilesystem watcher that keeps a Qdrant vector store in sync with document changes.\n\n## Overview\n\n`jeeves-watcher` monitors a configured set of directories for file changes, extracts text content, generates embeddings, and maintains a synchronized Qdrant vector store for semantic search. It automatically:\n\n- **Watches** directories for file additions, modifications, and deletions\n- **Extracts** text from various formats (Markdown, PDF, DOCX, HTML, JSON, plain text)\n- **Chunks** large documents for optimal embedding\n- **Embeds** content using configurable providers (Google Gemini, mock for testing)\n- **Syncs** to Qdrant for fast semantic search\n- **Enriches** metadata via rules and API endpoints\n\n### Architecture\n\n![System Architecture](packages/service/assets/system-architecture.png)\n\nFor detailed architecture documentation, see [packages/service/guides/architecture.md](packages/service/guides/architecture.md).\n\n## Quick Start\n\n### Installation\n\n```bash\nnpm install -g @karmaniverous/jeeves-watcher\n```\n\n### Initialize Configuration\n\nCreate a new configuration file in your project:\n\n```bash\njeeves-watcher init\n```\n\nThis generates a `jeeves-watcher.config.json` file with sensible defaults.\n\n### Configure\n\nEdit `jeeves-watcher.config.json` to specify:\n\n- **Watch paths**: Directories to monitor\n- **Embedding provider**: Google Gemini or mock (for testing)\n- **Qdrant connection**: URL and collection name\n- **Inference rules**: Automatic metadata enrichment based on file patterns\n\nExample minimal configuration:\n\n```json\n{\n  \"watch\": {\n    \"paths\": [\"./docs\"],\n    \"ignored\": [\"**/node_modules/**\", \"**/.git/**\"]\n  },\n  \"embedding\": {\n    \"provider\": \"gemini\",\n    \"model\": \"gemini-embedding-001\",\n    \"apiKey\": \"${GOOGLE_API_KEY}\"\n  },\n  \"vectorStore\": {\n    \"url\": \"http://localhost:6333\",\n    \"collectionName\": \"my_docs\"\n  }\n}\n```\n\n### Start Watching\n\n```bash\njeeves-watcher start\n```\n\nThe watcher will:\n\n1. Index all existing files in watched directories\n2. Monitor for changes\n3. Update Qdrant automatically\n\n## CLI Commands\n\n| Command | Description |\n| --- | --- |\n| `jeeves-watcher start` | Start the filesystem watcher (foreground) |\n| `jeeves-watcher init` | Initialize a new configuration file |\n| `jeeves-watcher status` | Show watcher status |\n| `jeeves-watcher reindex` | Reindex all watched files |\n| `jeeves-watcher rebuild-metadata` | Rebuild metadata files from Qdrant payloads |\n| `jeeves-watcher search \u003cquery\u003e` | Search the vector store |\n| `jeeves-watcher enrich \u003cpath\u003e` | Enrich document metadata with key-value pairs |\n| `jeeves-watcher validate` | Validate the configuration |\n| `jeeves-watcher service` | Manage the watcher as a system service |\n| `jeeves-watcher scan` | Scan the vector store with filter-only queries |\n| `jeeves-watcher config` | Query effective config via JSONPath |\n| `jeeves-watcher issues` | Show indexing issues and errors |\n| `jeeves-watcher helpers` | Show loaded map and template helpers |\n| `jeeves-watcher config-apply` | Validate, write, and reload configuration from file |\n\n## Configuration\n\n### Environment Variable Substitution\n\nConfig strings support `${VAR_NAME}` syntax for environment variable injection:\n\n```json\n{\n  \"embedding\": {\n    \"apiKey\": \"${GOOGLE_API_KEY}\"\n  }\n}\n```\n\nIf `GOOGLE_API_KEY` is set in the environment, the value is substituted at config load time. Set templates in inference rules use Handlebars `{{...}}` syntax (e.g. `{{frontmatter.title}}`), which is distinct from the `${...}` environment variable syntax used in config values like `embedding.apiKey`.\n\n### Watch Paths\n\n```json\n{\n  \"watch\": {\n    \"paths\": [\"./docs\", \"./notes\"],\n    \"ignored\": [\"**/node_modules/**\", \"**/*.tmp\"]\n  }\n}\n```\n\n- **`paths`**: Array of glob patterns or directories to watch\n- **`ignored`**: Array of patterns to exclude\n- **`respectGitignore`**: (default: `true`) Skip processing files ignored by `.gitignore` in git repositories. Nested `.gitignore` files are respected within their subtree.\n- **`moveDetection`**: (optional) Correlate unlink+add events as file moves to avoid re-embedding. `enabled` (default: `true`), `bufferMs` (default: `2000`) — how long to buffer unlink events before treating as deletes.\n\n### Embedding Provider\n\n#### Google Gemini\n\n```json\n{\n  \"embedding\": {\n    \"provider\": \"gemini\",\n    \"model\": \"gemini-embedding-001\",\n    \"apiKey\": \"${GOOGLE_API_KEY}\"\n  }\n}\n```\n\n### Vector Store\n\n```json\n{\n  \"vectorStore\": {\n    \"url\": \"http://localhost:6333\",\n    \"collectionName\": \"my_collection\"\n  }\n}\n```\n\n### Inference Rules\n\nAutomatically enrich metadata based on file patterns using declarative JSON Schemas:\n\n```json\n{\n  \"schemas\": {\n    \"base\": {\n      \"type\": \"object\",\n      \"properties\": {\n        \"domain\": {\n          \"type\": \"string\",\n          \"description\": \"Content domain\"\n        }\n      }\n    }\n  },\n  \"inferenceRules\": [\n    {\n      \"name\": \"meeting-classifier\",\n      \"description\": \"Classify files under meetings directory\",\n      \"match\": {\n        \"properties\": {\n          \"file\": {\n            \"type\": \"object\",\n            \"properties\": {\n              \"path\": { \"type\": \"string\", \"glob\": \"**/meetings/**\" }\n            }\n          }\n        }\n      },\n      \"schema\": [\n        \"base\",\n        {\n          \"properties\": {\n            \"domain\": { \"set\": \"meetings\" },\n            \"category\": { \"type\": \"string\", \"set\": \"notes\" }\n          }\n        }\n      ]\n    }\n  ]\n}\n```\n\n**New in v0.5.0:** Inference rules now use `schema` arrays that reference global named schemas. Type coercion automatically converts string interpolation results to declared types (integer, number, boolean, array, object). See [Inference Rules Guide](packages/service/guides/inference-rules.md) for details.\n\n### Chunking\n\nChunking settings are configured under `embedding`:\n\n```json\n{\n  \"embedding\": {\n    \"chunkSize\": 1000,\n    \"chunkOverlap\": 200\n  }\n}\n```\n\n### Enrichment Store\n\nEnrichment metadata (from `POST /metadata` or `watcher_enrich`) is stored in a SQLite database at `\u003cstateDir\u003e/enrichments.sqlite`. Enrichments survive full reindexes. Composable merge: scalar fields overwrite, array fields union+deduplicate with inference rule output.\n\n```json\n{\n  \"stateDir\": \".jeeves-metadata\"\n}\n```\n\n## API Endpoints\n\nThe watcher provides a REST API (default port: 1936):\n\n| Endpoint | Method | Description |\n| --- | --- | --- |\n| `/status` | GET | Health check, uptime, and collection stats |\n| `/search` | POST | Semantic search (`{ query: string, limit?: number, filter?: object }`) |\n| `/render` | POST | Render a file through inference rules (`{ path: string }`) (v0.8.0+) |\n| `/search/facets` | GET | Schema-derived search facet definitions with live values (v0.8.0+) |\n| `/metadata` | POST | Update document metadata with schema validation (`{ path: string, metadata: object }`) |\n| `/reindex` | POST | Scoped reindex with blast area plan (`issues`, `rules`, `full`, `path`, `prune` + `dryRun`). `path` accepts `string \\| string[]`. |\n| `/rebuild-metadata` | POST | Rebuild metadata files from Qdrant |\n| `/config` | GET | Full resolved effective config; optional `?path=\u003cjsonpath\u003e` filter. Rules include `source` attribution. |\n| `/config/schema` | GET | JSON Schema of merged virtual document (v0.5.0+) |\n| `/walk` | POST | Filesystem walk with glob intersection (`{ globs: string[] }`). Returns `{ paths, matchedCount, scannedRoots }`. |\n| `/config/match` | POST | Test paths against inference rules (`{ paths: string[] }`) (v0.5.0+) |\n| `/issues` | GET | Current embedding failures and processing errors (v0.5.0+) |\n| `/rules/register` | POST | Register virtual inference rules from an external source |\n| `/rules/unregister` | DELETE | Remove all virtual rules from a source (`{ source }`) |\n| `/rules/unregister/:source` | DELETE | Remove all virtual rules from a named source |\n| `/scan` | POST | Filter-only point query with cursor pagination (`{ filter, limit?, cursor?, fields?, countOnly? }`) |\n| `/config/validate` | POST | Validate a configuration without applying (`{ config?, testPaths? }`) |\n| `/config/apply` | POST | Validate, write, and reload configuration (`{ config }`) |\n| `/rules/reapply` | POST | Re-apply inference rules to files matching globs (`{ globs }`) |\n| `/points/delete` | POST | Delete points matching a Qdrant filter (`{ filter }`) |\n\n### Example: Search\n\n```bash\ncurl -X POST http://localhost:1936/search \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"query\": \"machine learning algorithms\", \"limit\": 5}'\n```\n\n### Example: Search With Filter\n\n```bash\ncurl -X POST http://localhost:1936/search \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"query\": \"error handling\",\n    \"limit\": 10,\n    \"filter\": {\n      \"must\": [{ \"key\": \"domain\", \"match\": { \"value\": \"backend\" } }]\n    }\n  }'\n```\n\n### Example: Update Metadata\n\n```bash\ncurl -X POST http://localhost:1936/metadata \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"path\": \"/path/to/document.md\",\n    \"metadata\": {\n      \"priority\": \"high\",\n      \"category\": \"research\"\n    }\n  }'\n```\n\n## OpenClaw Plugin\n\nThis repo includes an OpenClaw plugin (`packages/openclaw`) that exposes the jeeves-watcher API as native agent tools:\n\n| Tool                   | Description                                    |\n| ---------------------- | ---------------------------------------------- |\n| `watcher_status`       | Service health, uptime, and collection stats   |\n| `watcher_search`       | Semantic search across indexed documents       |\n| `watcher_enrich`       | Set or update document metadata                |\n| `watcher_config`       | Query the effective runtime config via JSONPath |\n| `watcher_walk`         | Walk watched filesystem paths with glob intersection |\n| `watcher_validate`     | Validate a watcher configuration               |\n| `watcher_config_apply` | Apply a new configuration                      |\n| `watcher_reindex`      | Trigger a scoped reindex with blast area plan   |\n| `watcher_scan`         | Filter-only point query with cursor pagination |\n| `watcher_issues`       | List indexing issues and errors                |\n\nThe plugin integrates with [`@karmaniverous/jeeves`](https://www.npmjs.com/package/@karmaniverous/jeeves) core to manage workspace content (TOOLS.md, SOUL.md, AGENTS.md) via a `ComponentWriter` that refreshes every 71 seconds. See the [OpenClaw Integration Guide](packages/openclaw/guides/openclaw-integration.md) for details.\n\nPlugin configuration supports `apiUrl` (defaults to `http://127.0.0.1:1936`) and `configRoot` (defaults to `j:/config`).\n\n## Supported File Formats\n\n- **Markdown** (`.md`, `.markdown`) — with YAML frontmatter support\n- **PDF** (`.pdf`) — text extraction\n- **DOCX** (`.docx`) — Microsoft Word documents\n- **HTML** (`.html`, `.htm`) — content extraction (scripts/styles removed)\n- **JSON** (`.json`) — with smart text field detection\n- **Plain Text** (`.txt`, `.text`)\n\n## License\n\nBSD-3-Clause\n\n---\n\nBuilt for you with ❤️ on Bali by [Jason Williscroft](https://github.com/karmaniverous) \u0026 [Jeeves](https://github.com/jgs-jeeves).\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkarmaniverous%2Fjeeves-watcher","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkarmaniverous%2Fjeeves-watcher","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkarmaniverous%2Fjeeves-watcher/lists"}