{"id":50323823,"url":"https://github.com/queelius/mail-memex","last_synced_at":"2026-05-29T04:30:59.141Z","repository":{"id":341490621,"uuid":"1127775995","full_name":"queelius/mail-memex","owner":"queelius","description":"Personal email archive with SQLite+FTS5 full-text search and MCP server for LLM access. Part of the *-memex ecosystem.","archived":false,"fork":false,"pushed_at":"2026-04-24T06:17:09.000Z","size":881,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-24T06:26:36.757Z","etag":null,"topics":["archive","cli","email","mail","notmuch","personal-data","privacy","python","semantic-search"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/queelius.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-01-04T15:18:31.000Z","updated_at":"2026-04-24T06:17:12.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/queelius/mail-memex","commit_stats":null,"previous_names":["queelius/mtk","queelius/mail-memex"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/queelius/mail-memex","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/queelius%2Fmail-memex","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/queelius%2Fmail-memex/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/queelius%2Fmail-memex/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/queelius%2Fmail-memex/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/queelius","download_url":"https://codeload.github.com/queelius/mail-memex/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/queelius%2Fmail-memex/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33637485,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-29T02:00:06.066Z","response_time":107,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["archive","cli","email","mail","notmuch","personal-data","privacy","python","semantic-search"],"created_at":"2026-05-29T04:30:58.578Z","updated_at":"2026-05-29T04:30:59.130Z","avatar_url":"https://github.com/queelius.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# mail-memex\n\nPersonal email archive with SQLite+FTS5 full-text search and an MCP server for LLM access.\n\nPart of the [`*-memex` ecosystem](../memex) of co-located personal archives: `llm-memex` (AI conversations), `bookmark-memex`, `photo-memex`, `book-memex`, `hugo-memex`, and `health-memex`.\n\n## What It Does\n\n- **Ingests** email from Gmail Takeout, mbox files, `.eml` files, or IMAP (Gmail OAuth2 or any IMAP server).\n- **Indexes** every message with SQLite FTS5 for ranked full-text search (Porter stemming, BM25 weighting).\n- **Rebuilds threads** from `In-Reply-To` / `References` headers.\n- **Exposes** the archive as an MCP server so LLMs can query it with SQL or Gmail-style operators.\n- **Exports** to JSON, mbox, markdown, a single-file HTML SPA (embedded SQLite via sql.js), or arkiv JSONL for cross-archive ingestion.\n- **Marginalia:** attach free-form notes to any email, thread, or marginalia itself, addressable by durable URIs that survive re-imports.\n- **Soft-deletes:** archived emails and threads stay queryable but are filtered from default results, so MCP trails and marginalia references don't break.\n\nmail-memex is **not an email client**. It does not send or reply to mail. It's a long-term archive and an LLM-queryable surface on top of your mail history.\n\n## Install\n\n```bash\npip install -e \".[mcp,imap,imap-oauth]\"\n```\n\nOptional extras: `mcp` (MCP server), `imap` (IMAP pull with keyring), `imap-oauth` (Gmail OAuth2).\n\n## Quick Start\n\n```bash\n# 1. Initialize\nmail-memex init\n\n# 2. Import from Gmail Takeout, mbox, or eml\nmail-memex import gmail ~/takeout/\"All mail Including Spam and Trash.mbox\"\nmail-memex import mbox archive.mbox\nmail-memex import eml ~/Mail/\n\n# 3. Search\nmail-memex search \"project proposal\"\nmail-memex search \"from:alice after:2024-01-01 has:attachment\"\n\n# 4. Start MCP server (LLM-facing)\nmail-memex mcp\n```\n\n## CLI Commands\n\n| Command | Purpose |\n|---------|---------|\n| `mail-memex init` | Create the database |\n| `mail-memex import {mbox,eml,gmail}` | Import from a source |\n| `mail-memex search QUERY` | Search (FTS5 ranked, with Gmail-style operators) |\n| `mail-memex tag {add,remove,list,batch}` | Manage tags |\n| `mail-memex rebuild {index,threads}` | Rebuild FTS index or threads |\n| `mail-memex export {json,mbox,markdown,html,arkiv}` | Export the archive |\n| `mail-memex imap {accounts,sync,folders,test}` | IMAP incremental sync |\n| `mail-memex mcp` | Start the stdio MCP server |\n\nAll commands accept `--json` for machine-readable output.\n\n### Search Operators\n\nGmail-style query language:\n\n```\nfrom:alice              # sender address contains \"alice\"\nto:bob@example.com      # any recipient field contains \"bob@...\"\nsubject:proposal        # subject contains \"proposal\"\nafter:2024-01-01        # date on or after\nbefore:2024-12-31       # date on or before\ntag:work                # has the tag \"work\"\n-tag:archive            # does NOT have the tag \"archive\"\nhas:attachment          # has at least one attachment\nthread:\u003cthread-id\u003e      # in a specific thread\n```\n\nFree-text terms are matched against subject and body via FTS5.\n\n## MCP Server\n\nThe MCP server is the primary interface for LLM access. Configure it in `~/.claude.json` (or any MCP client config):\n\n```json\n{\n  \"mcpServers\": {\n    \"mail-memex\": {\n      \"type\": \"stdio\",\n      \"command\": \"/path/to/venv/bin/python\",\n      \"args\": [\"-m\", \"mail_memex.mcp\"]\n    }\n  }\n}\n```\n\n### Exposed Tools\n\n**Contract tools** (shared across the `*-memex` ecosystem):\n\n| Tool | Purpose |\n|------|---------|\n| `get_schema` | Return DDL, column metadata, descriptions, and query tips |\n| `execute_sql(sql, readonly=true)` | Run SQL. DDL always blocked; writes blocked by default. |\n| `get_record(kind, record_id)` | Resolve a `mail-memex://` URI. `kind` is one of `email`, `thread`, `marginalia`. Returns soft-deleted records too, so trail steps don't break. |\n\n**Domain tools:**\n\n| Tool | Purpose |\n|------|---------|\n| `search_emails(query, limit)` | Gmail-style search, BM25 ranked |\n| `create_marginalia(target_uris, content, category?, color?, pinned?)` | Attach a note to one or more URIs |\n| `list_marginalia(target_uri?, include_archived?, limit?)` | List notes |\n| `get_marginalia(uuid)` | Fetch a note by UUID |\n| `update_marginalia(uuid, ...)` | Update fields |\n| `delete_marginalia(uuid, hard?)` | Soft delete by default; `hard=true` for permanent removal |\n| `restore_marginalia(uuid)` | Undo a soft delete |\n\n## Architecture\n\n```\n┌─────────────────────────────────────────────────────────────┐\n│     Sources: mbox, eml, Gmail Takeout, IMAP (incl. OAuth2)  │\n└────────────────────────────┬────────────────────────────────┘\n                             ▼\n              ┌──────────────────────────────┐\n              │   Importers (idempotent      │\n              │   dedup by Message-ID)       │\n              └──────────────┬───────────────┘\n                             ▼\n┌────────────────────────────────────────────────────────────┐\n│                     SQLite + FTS5                           │\n│                                                             │\n│  emails, threads, tags, attachments, imap_sync_state,       │\n│  marginalia, marginalia_targets, emails_fts (Porter+BM25)  │\n│                                                             │\n│  Soft delete via archived_at on emails, threads, marginalia│\n└──────┬────────────────────────────────────┬─────────────────┘\n       │                                    │\n       ▼                                    ▼\n┌──────────────────────┐         ┌─────────────────────────┐\n│  MCP Server          │         │   Exporters             │\n│  (FastMCP, stdio)    │         │                         │\n│                      │         │   JSON, mbox, markdown, │\n│  execute_sql         │         │   HTML SPA (sql.js),    │\n│  get_schema          │         │   arkiv JSONL+schema    │\n│  get_record          │         │                         │\n│  search_emails       │         │   Cross-archive URIs:   │\n│  marginalia CRUD     │         │   mail-memex://...      │\n└──────────────────────┘         └─────────────────────────┘\n```\n\n### URI Scheme\n\nRecords are addressable by URI, making cross-archive references durable:\n\n```\nmail-memex://email/\u003cmessage_id\u003e\nmail-memex://thread/\u003cthread_id\u003e\nmail-memex://marginalia/\u003cuuid\u003e\n```\n\nA trail in `memex` (or a note anywhere else) can reference these as plain strings. The archive resolves them via `get_record`.\n\n### Database ID Scheme\n\n- **`message_id`** (RFC 2822 Message-ID header) is the durable, external identifier used for deduplication and URIs. If a message arrives without one, mail-memex generates `generated-{sha256[:32]}@mail-memex.local` deterministically.\n- **`id`** (auto-increment integer) is the internal primary key used for FK relationships and FTS5 joins. Do not expose this outside the database.\n\n### FTS5 Details\n\n- Tokenizer: `porter unicode61` (English stemming + Unicode segmentation).\n- BM25 column weights: `subject=10.0, body_text=1.0, from_addr=5.0, from_name=5.0`.\n- Triggers keep `emails_fts` in sync with `emails` automatically on INSERT/UPDATE/DELETE.\n- Falls back to SQL `LIKE` matching if FTS5 is unavailable.\n\n### Thread Reconstruction\n\nThreads are rebuilt from `In-Reply-To` headers (only). The algorithm:\n\n1. Find emails with `thread_id IS NULL` that have `in_reply_to`.\n2. Look up the parent by `message_id`.\n3. If the parent has a thread, join it. Otherwise, create `thread-{parent.message_id}` and assign both.\n4. Loop until no new threads are created (handles deep chains).\n\nRuns automatically after every import, and can be re-run with `mail-memex rebuild threads`.\n\n## Paths\n\n- **Config:** `~/.config/mail-memex/config.yaml`\n- **Database:** `~/.local/share/mail-memex/mail-memex.db`\n- **Env var:** `MAIL_MEMEX_DATABASE_PATH` overrides the database path.\n\n## Data Model\n\nCore tables (all carry `archived_at TIMESTAMP NULL` where applicable):\n\n- **emails**: headers (to/cc/bcc as comma-separated strings), body text/html, preview, thread_id, raw headers in `metadata_json` for custom field queries.\n- **threads**: thread_id, subject, first/last date, email count.\n- **tags**: name (unique), source (`mail-memex` or `imap`).\n- **attachments**: filename, content type, size. Content is not stored (retrieve from the source file).\n- **marginalia**: uuid, content, category, color, pinned. Free-form notes.\n- **marginalia_targets**: many-to-many join from marginalia to target URIs (strings, no FK).\n- **imap_sync_state**: per-account, per-folder UIDVALIDITY and last_uid for incremental sync.\n- **emails_fts**: FTS5 virtual table.\n\n## Design Principles\n\n- **Contract compliance.** Satisfies the `*-memex` archive contract: SQLite+FTS5, MCP server with `execute_sql`/`get_schema`/`get_record`, arkiv export, soft delete, marginalia, durable URIs.\n- **Thin admin CLI.** Use the CLI for import, export, and housekeeping. Use the MCP server for interactive query. Marginalia is MCP-only.\n- **No embeddings here.** Archives stay narrow. The federation layer (`memex`, soon to be renamed to `memex`) computes embeddings and maintains cross-archive trails.\n- **Re-importable.** Dedup by Message-ID, so running the same import twice is a no-op.\n\n## Status\n\nv0.6.0 (alpha). Active development. The archive shape is stable but the CLI surface may still evolve.\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqueelius%2Fmail-memex","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fqueelius%2Fmail-memex","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqueelius%2Fmail-memex/lists"}