{"id":44730324,"url":"https://github.com/keithah/browser-history","last_synced_at":"2026-02-15T18:12:43.578Z","repository":{"id":333807007,"uuid":"1138720301","full_name":"keithah/browser-history","owner":"keithah","description":"Local CLI + MCP server to ingest and query browser history (Safari/Chrome/Firefox/Edge/Brave) with auto-labeling.","archived":false,"fork":false,"pushed_at":"2026-01-21T15:13:45.000Z","size":33,"stargazers_count":0,"open_issues_count":2,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-01T10:47:05.372Z","etag":null,"topics":["browser-history","cli","history","mcp","python","sqlite"],"latest_commit_sha":null,"homepage":"https://github.com/keithah/browser-history","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/keithah.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-01-21T03:09:59.000Z","updated_at":"2026-01-21T03:15:34.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/keithah/browser-history","commit_stats":null,"previous_names":["keithah/browser-history"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/keithah/browser-history","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/keithah%2Fbrowser-history","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/keithah%2Fbrowser-history/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/keithah%2Fbrowser-history/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/keithah%2Fbrowser-history/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/keithah","download_url":"https://codeload.github.com/keithah/browser-history/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/keithah%2Fbrowser-history/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29486104,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-15T15:33:17.885Z","status":"ssl_error","status_checked_at":"2026-02-15T15:32:53.698Z","response_time":118,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["browser-history","cli","history","mcp","python","sqlite"],"created_at":"2026-02-15T18:12:43.047Z","updated_at":"2026-02-15T18:12:43.571Z","avatar_url":"https://github.com/keithah.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Browser History Query\n\nLocal CLI + MCP server to ingest Safari/Chrome/Firefox/Edge/Brave history, auto-label visits, and answer quick questions (e.g., “list all the GitHub projects I visited”, “what AI sites have I hit recently”) without sending data anywhere.\n\n[![CI](https://github.com/keithah/browser-history/actions/workflows/ci.yml/badge.svg)](https://github.com/keithah/browser-history/actions/workflows/ci.yml)\n\n## Features\n- Copies browser history DBs (Safari, Chrome, Firefox, Edge, Brave), converts timestamps, and stores into `data/history_store.db`.\n- Deduping: by default, ingestion dedupes globally on `url+timestamp` (enforced with a unique index) and also tracks per-source occurrences; you can switch to per-source dedupe with `--dedupe-scope source`.\n- Automatic labeling heuristics: `github_project`, `ai`, `personal_site`, `docs`, `news`, `social`, `shopping`, `video`, `other`.\n- Summaries derived from title/path (e.g., GitHub repos become `owner/repo`), optional page metadata fetch to improve titles/descriptions, plus full-text search over URL/title/summary/domain/path/labels.\n- Natural-language-ish `ask` command routes questions to filters; `reclassify` lets you refresh labels after tweaking heuristics.\n- MCP server exposes tools/resources so agents can call the same ingestion/query/ask logic.\n\n## Setup\n```bash\npython3.11 -m venv .venv\n. .venv/bin/activate\npip install -e .\n# for tests/dev extras\npip install -e .[dev]\n```\n\n## CLI\n- Ingest new visits (deduped): `browser-history ingest --browser safari|chrome|firefox|edge|brave|all` (auto-detects Chrome/Edge/Brave profiles and Firefox release/beta/dev/ESR profiles; add `--limit 200` to sample; add `--fetch-metadata` to pull page titles/descriptions over the network; use `--metadata-max` to cap how many pages are fetched per run, default 500; `--progress-every` to log progress; `--state-interval` to flush ingest_state more frequently; `--dedupe-scope global|source` to choose global dedupe (default) or per-browser/profile; `--analyze` to run SQLite ANALYZE after ingest; `--vacuum` for an offline VACUUM). Warnings are printed if a requested browser DB cannot be found.\n- Query by filters: `browser-history query --category ai --since 30d --limit 20 [--source safari|chrome:Profile|firefox:Profile]`\n- Full-text search: `--search foo` uses SQLite FTS5 over URL/title/summary/domain/path/labels (falls back to LIKE if FTS5 unavailable).\n- Ask in plain-ish text: `browser-history ask \"list all the github projects\"`\n- Re-run classifiers on stored rows (after heuristic changes): `browser-history reclassify`\n- Backfill legacy rows into `visit_occurrences`: `browser-history backfill-occurrences [--source foo] [--limit 1000]`\n- Run maintenance: `browser-history maintain` (defaults to ANALYZE + VACUUM; pass `--analyze` or `--vacuum` to limit)\n- Stats: `browser-history stats [--since 30d] [--source ...]` shows coverage, counts by category, counts by source, and per-source ingest_state (including profile names).\n- Daemon: `browser-history daemon --browser all --fetch-metadata --metadata-max 200 --interval 1800 --progress-every 1000` runs continuous ingest on a timer; press Ctrl+C to stop.\n- Common `--since` formats: `7d`, `12h`, `2024-12-01`.\n\nData lives in `data/history_store.db`; snapshots of browser DBs are kept in `data/*_snapshot.db` for ingestion safety.\n\n## MCP server\n- Run: `browser-history-mcp` (stdio transport). Requires the `mcp` Python package (installed via `pip install -e .`).\n- Tools: `ingest_history(browser?, limit?, fetch_metadata=false, metadata_max=500?, dedupe_scope=global|source)` (browsers include safari/chrome/firefox/edge/brave/all), `query_history(category?, search?, domain?, source?, since?, limit)`, `ask_history(question, limit)`, `stats_history(since?, source?)`, `backfill_occurrences(source?, limit?)`.\n- Resource template: `resource://browser/history/recent/{limit}` returns recent visits as JSON.\n- Any MCP-capable client can connect over stdio; you don’t need extra configuration beyond pointing the client to the `browser-history-mcp` command.\n\n## Notes\n- Classification is heuristic-only (no network/LLM calls). Add more patterns in `browser_history/classifier.py` if you want finer buckets.\n- To start fresh, remove `data/history_store.db` and re-run `ingest`.\n- Ingest state (progress, last seen timestamps, metadata count) is stored in `ingest_state` inside `data/history_store.db`; `browser-history stats` prints it per source so you can resume/monitor long runs. Each visit also stores occurrences (sources + raw timestamps) so you can see all browsers/profiles that hit the same page when using global dedupe.\n- Releasing/publishing: tags `v*` trigger a build; set `PYPI_API_TOKEN` in repo secrets to publish via the provided GitHub Actions workflow. CI runs tests on pushes/PRs across Python 3.9–3.12.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkeithah%2Fbrowser-history","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkeithah%2Fbrowser-history","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkeithah%2Fbrowser-history/lists"}