{"id":45903410,"url":"https://github.com/danieliser/tessera","last_synced_at":"2026-03-08T09:04:07.389Z","repository":{"id":340128658,"uuid":"1164610725","full_name":"danieliser/tessera","owner":"danieliser","description":"Hierarchical, scope-gated codebase indexing and persistent memory system for AI agents","archived":false,"fork":false,"pushed_at":"2026-02-27T22:49:41.000Z","size":629,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-28T04:08:02.513Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/danieliser.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-23T09:32:58.000Z","updated_at":"2026-02-27T22:49:44.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/danieliser/tessera","commit_stats":null,"previous_names":["danieliser/tessera"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/danieliser/tessera","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieliser%2Ftessera","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieliser%2Ftessera/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieliser%2Ftessera/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieliser%2Ftessera/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/danieliser","download_url":"https://codeload.github.com/danieliser/tessera/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieliser%2Ftessera/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30068040,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-04T01:03:42.280Z","status":"ssl_error","status_checked_at":"2026-03-04T01:03:23.410Z","response_time":61,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-02-28T00:55:45.269Z","updated_at":"2026-03-04T01:17:51.790Z","avatar_url":"https://github.com/danieliser.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Tessera\n\nPersistent codebase intelligence for autonomous AI agents. Tessera gives agents bottom-up file access and top-down code understanding — across every project they're authorized to touch, with security from the ground up.\n\n## The Problem\n\nPersistent AI agents — orchestrators like AutoJack, task agents like OpenClaw — need to understand codebases the way a senior developer does. Not just \"find this string in a file,\" but \"what calls this function, across which projects, and what breaks if I change it?\"\n\nToday's agents burn context window and wall-clock time on repeated `grep` / `find` / `cat` cycles. They lose track of project structure between conversations. They can't safely delegate to sub-agents without leaking access to projects those agents shouldn't see. And they can't search documentation, config files, or assets alongside code.\n\n## What Tessera Does\n\nTessera indexes everything — code, documents, config files, media assets, binary files — into a structured, chunked, searchable database. It exposes that through 18 MCP tools that any agent can call. Responses come back in milliseconds, not seconds.\n\n**For orchestrator agents:** Full system visibility. Register projects, group them into collections, search across all of them. Understand cross-project dependencies. Delegate scoped access to sub-agents via session tokens.\n\n**For task agents:** Deep code intelligence within their authorized scope. Symbol lookup, reference tracing, impact analysis, document search — everything an IDE provides, but through tool calls.\n\n**For security:** Deny-by-default scope gating. Sub-agents only see what the orchestrator explicitly grants. Credentials and secrets are blocked from indexing by un-negatable security patterns. No ambient access, no scope creep.\n\n### Code Intelligence\n- **Symbol search** — Functions, classes, methods, hooks by name or pattern\n- **Reference tracing** — Call graphs, imports, inheritance chains\n- **Impact analysis** — \"What breaks if I change this?\" — traced N levels deep\n- **File context** — Complete structural overview of any file in one call\n- **Cross-project references** — Track where project A's exports are used in project B\n\n### Document \u0026 Text Search\n- **Chunked indexing** — Files are split into focused, searchable chunks with metadata (by header, key path, or line group) — not stored as monolithic blobs\n- **Code + docs unified** — Query across everything, or filter by source type (`code`, `asset`, `document`)\n- **Structural formats** — PDF, Markdown (break-point scoring with distance decay), YAML/JSON (key-path chunking)\n- **Markup** — HTML/XML with tag stripping\n- **Plaintext** — `.txt`, `.rst`, `.csv`, `.log`, `.ini`, `.cfg`, `.toml`, config files, dotfiles\n\n### Media \u0026 Binary File Indexing\n- **Asset discovery** — Images, videos, audio, fonts, and archives are automatically discovered and indexed\n- **Metadata extraction** — Filename, path, MIME type, file size, and image dimensions (PNG, JPEG, GIF, BMP) — zero external dependencies\n- **FTS5 searchable** — Search for assets by name, category, format, or path components\n- **Source type filtering** — Filter search results to `asset`, `code`, or `document` via the `source_type` parameter\n- **SVG dual-indexing** — SVGs indexed as both searchable XML documents and image assets\n\n### Multi-Project Federation\n- **Project collections** — Group related projects (e.g., a plugin ecosystem) and query across them\n- **Scope-gated access** — Session tokens control what each agent can see. Orchestrators create scoped tokens for sub-agents.\n- **Search-time federation** — Data stays at project level, merged at query time. No duplication.\n\n### Security\n- **Deny-by-default** — No access without a valid session token\n- **`.tesseraignore`** — Per-project ignore config with `.gitignore` syntax\n- **Two-tier ignore system** — Security-critical patterns (`.env*`, `*.pem`, `*credentials*`) are locked and cannot be overridden by project config\n- **`trusted` field** — Search results from code are marked trusted; document content is marked untrusted so agents can handle prompt injection risk\n\n### Infrastructure\n- **Fully embedded** — SQLite + FAISS. No Docker, no daemons, no external servers\n- **Incremental indexing** — Git-aware, only re-indexes changed files\n- **Schema migration** — Versioned database schema with automatic upgrades\n- **Drift adapter** — Switch embedding models without re-indexing (Orthogonal Procrustes)\n\n## Supported Languages\n\nPHP, TypeScript, JavaScript, Python, Swift — via tree-sitter grammars.\n\n## MCP Tools (18)\n\n### Search \u0026 Navigation\n| Tool | Purpose |\n|------|---------|\n| `search` | Hybrid keyword + semantic search across code, documents, and assets (filterable by `source_type`) |\n| `doc_search_tool` | Document-only search (filterable by format or `source_type`) |\n| `symbols` | Look up functions, classes, methods by name/pattern/kind |\n| `references` | Find all references to a symbol (calls, imports, extends) |\n| `file_context` | Complete context for a file (symbols, refs, structure) |\n| `impact` | Trace downstream impact of changing a symbol |\n| `cross_refs` | Cross-project references to a symbol |\n| `collection_map` | Overview of projects in a collection with stats |\n\n### Administration\n| Tool | Purpose |\n|------|---------|\n| `register_project` | Register a project for indexing |\n| `reindex` | Trigger full or incremental re-index |\n| `status` | Project indexing status and health |\n| `drift_train` | Train embedding drift adapter for model migration |\n\n### Access Control\n| Tool | Purpose |\n|------|---------|\n| `create_scope_tool` | Create scoped session tokens for sub-agents |\n| `revoke_scope_tool` | Revoke agent session tokens |\n| `create_collection_tool` | Create a project collection |\n| `add_to_collection_tool` | Add a project to a collection |\n| `list_collections_tool` | List all collections |\n| `delete_collection_tool` | Delete a collection |\n\n## Quick Start\n\n### Requirements\n- Python 3.11+\n- [uv](https://docs.astral.sh/uv/) (recommended) or pip\n\n### Install\n```bash\ngit clone https://github.com/danieliser/tessera.git\ncd tessera\nuv sync\n```\n\n### Run as MCP Server\n\nAdd to your `.mcp.json`:\n```json\n{\n  \"mcpServers\": {\n    \"tessera\": {\n      \"command\": \"uv\",\n      \"args\": [\n        \"--directory\", \"/path/to/tessera\",\n        \"run\", \"python\", \"-m\", \"tessera\", \"serve\"\n      ]\n    }\n  }\n}\n```\n\nLock to a specific project (single-project mode):\n```bash\nuv run python -m tessera serve --project /path/to/your/project\n```\n\n### Embedding Setup (Optional)\n\nTessera works without embeddings (keyword search only via FTS5). For semantic search, point it at any local OpenAI-compatible embedding endpoint. The embedding dimension is auto-detected — no configuration needed.\n\nRecommended: [LM Studio](https://lmstudio.ai) with `nomic-embed-text` or any embedding model serving on `/v1/embeddings`.\n\n### Run Tests\n```bash\nuv run pytest tests/ -v\n```\n\n## Architecture\n\n```\nMCP Server (stdio)\n├── Scope Validator (session-based, deny-by-default)\n├── Query Router (project / collection / global)\n│   ├── Search (FTS5 keyword + FAISS semantic + RRF merge)\n│   ├── Symbols / References / Impact (SQLite graph)\n│   └── Document Search (source_type filtering)\n├── Per-Project Indexes\n│   ├── SQLite (symbols, references, edges, files, chunk_meta)\n│   └── FAISS (vector embeddings)\n├── Global SQLite (~/.tessera/global.db)\n│   ├── projects, collections, sessions\n│   └── indexing_jobs\n└── Indexer Pipeline\n    ├── Tree-sitter parser (PHP, TS, JS, Python, Swift)\n    ├── AST-aware code chunking\n    ├── Document extraction (PDF, MD, YAML, JSON, HTML, XML, plaintext)\n    ├── Asset metadata extraction (images, video, audio, fonts, archives)\n    └── Ignore filter (.tesseraignore, two-tier security)\n```\n\n### Design Principles\n\n- **No external dependencies at runtime** — SQLite + FAISS, fully embedded\n- **Tree-sitter for deterministic parsing** — no LLM-extracted graphs, no hallucinated edges\n- **Chunked everything** — every file is split into focused, searchable units with structural metadata\n- **Security-first scope model** — deny-by-default, session-scoped, un-negatable credential protection\n- **Federation over duplication** — data stays at project level, merged at query time\n\n## Project Status\n\n**v0.7.0** — Break-point markdown chunker, PyPI packaging (`pip install tessera-idx`), hybrid search with semantic snippet scoring, PPR graph ranking.\n\n| Phase | Status | What |\n|-------|--------|------|\n| 1 | Done | Single-project indexer + scoped MCP server |\n| 2 | Done | Incremental indexing + persistence |\n| 3 | Done | Collection federation + cross-project refs |\n| 4 | Done | Document indexing + drift adapter + ignore config + text formats |\n| 4.5 | Done | Media/binary file metadata catalog |\n| 5 | Done | PPR graph ranking + semantic snippet scoring |\n| 6 | Planned | Always-on file watcher |\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanieliser%2Ftessera","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdanieliser%2Ftessera","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanieliser%2Ftessera/lists"}