{"id":31592627,"url":"https://github.com/mearman/mcp-wayback-machine","last_synced_at":"2026-05-31T13:00:51.632Z","repository":{"id":297877128,"uuid":"996523578","full_name":"Mearman/mcp-wayback-machine","owner":"Mearman","description":"MCP server and CLI tool for interacting with the Internet Archive's Wayback Machine","archived":false,"fork":false,"pushed_at":"2026-05-30T22:02:28.000Z","size":462,"stargazers_count":29,"open_issues_count":2,"forks_count":7,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-30T23:20:38.771Z","etag":null,"topics":["archival","cli","internet-archive","mcp","mcp-server","model-context-protocol","wayback-machine","web-archive"],"latest_commit_sha":null,"homepage":"https://www.npmjs.com/package/mcp-wayback-machine","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Mearman.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-06-05T04:32:10.000Z","updated_at":"2026-05-30T22:00:01.000Z","dependencies_parsed_at":"2025-06-08T04:10:50.939Z","dependency_job_id":"6c6316c0-e008-4317-9674-346b4a810601","html_url":"https://github.com/Mearman/mcp-wayback-machine","commit_stats":null,"previous_names":["mearman/mcp-wayback-machine"],"tags_count":23,"template":false,"template_full_name":"Mearman/mcp-template","purl":"pkg:github/Mearman/mcp-wayback-machine","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mearman%2Fmcp-wayback-machine","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mearman%2Fmcp-wayback-machine/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mearman%2Fmcp-wayback-machine/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mearman%2Fmcp-wayback-machine/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Mearman","download_url":"https://codeload.github.com/Mearman/mcp-wayback-machine/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Mearman%2Fmcp-wayback-machine/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33731998,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-31T02:00:06.040Z","response_time":95,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["archival","cli","internet-archive","mcp","mcp-server","model-context-protocol","wayback-machine","web-archive"],"created_at":"2025-10-06T03:11:47.565Z","updated_at":"2026-05-31T13:00:51.578Z","avatar_url":"https://github.com/Mearman.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MCP Wayback Machine Server\n\n[![npm version](https://img.shields.io/npm/v/mcp-wayback-machine.svg)](https://www.npmjs.com/package/mcp-wayback-machine)\n[![License: CC BY-NC-SA 4.0](https://img.shields.io/badge/License-CC_BY--NC--SA_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc-sa/4.0/)\n[![GitHub Workflow Status](https://img.shields.io/github/actions/workflow/status/Mearman/mcp-wayback-machine/ci.yml?branch=main)](https://github.com/Mearman/mcp-wayback-machine/actions)\n\n\u003e MCP server and CLI tool for interacting with the Internet Archive's Wayback Machine. Supports full CDX search, snapshot content retrieval, screenshot listing, snapshot comparison, and optional authentication for higher SPN2 rate limits.\n\n**Stack:** TypeScript · Node.js 22+ · ES Modules · pnpm · Turbo · Zod\n\n## Getting started\n\nRequires Node.js 22+ and [pnpm](https://pnpm.io).\n\n```bash\npnpm install\n```\n\nOptional credentials (anonymous access works, but authenticated requests get higher SPN2 rate limits):\n\n```bash\nexport WAYBACK_ACCESS_KEY=\"your-access-key\"\nexport WAYBACK_SECRET_KEY=\"your-secret-key\"\n```\n\nObtain credentials at [archive.org/account/s3.php](https://archive.org/account/s3.php).\n\n## Build, test, and lint\n\n```bash\npnpm validate          # typecheck + lint + build + test + untested-files check (the full CI gate)\npnpm check             # typecheck + lint + build only\npnpm build             # compile TypeScript to dist/\npnpm test              # run unit and integration tests\npnpm test:coverage     # run tests with coverage (80% line/branch/function threshold)\npnpm lint              # lint with ESLint\npnpm lint:fix          # auto-fix lint issues\n```\n\nTo run a single test file:\n\n```bash\nnode --test tests/tools/save.unit.test.ts\n```\n\nEnd-to-end tests hit the live Wayback Machine API and are opt-in:\n\n```bash\npnpm test:e2e          # sets WAYBACK_LIVE_TESTS=1 internally via turbo\n```\n\n`pnpm validate` is the gate that must pass before a release. `prepublishOnly` runs it automatically.\n\n## Architecture\n\n`src/bin.ts` is the entry point. It detects whether it is invoked as a CLI or loaded as an MCP server and routes accordingly.\n\n```\nsrc/\n  bin.ts          — entry point; dispatches to CLI or MCP server\n  cli.ts          — Commander-based CLI implementation\n  server.ts       — MCP server wiring (ListTools + CallTool handlers)\n  contexts.ts     — shared context (rate limiter, cache, credentials)\n  schemas.ts      — Zod schemas for all tool inputs; single source of truth\n  tools/\n    save.ts       — save_url tool (SPN2 API)\n    retrieve.ts   — get_archived_url tool\n    search.ts     — search_archives tool (CDX API)\n    status.ts     — check_archive_status tool (sparkline API)\n    screenshots.ts — list_screenshots tool\n    compare.ts    — compare_snapshots tool\n    cache.ts      — clear_cache tool\n    context.ts    — injects shared context into tool handlers\n  utils/\n    http.ts       — fetch wrapper with rate limiting and Retry-After handling\n    cache.ts      — in-memory + disk cache with per-endpoint TTLs\n    rate-limit.ts — 15 req/min token bucket\n    validation.ts — shared Zod validation helpers\n```\n\nEach tool module exports a schema (consumed by `ListToolsRequestSchema`) and an execution function (consumed by `CallToolRequestSchema`). New tools need both registrations in `server.ts`.\n\nCaching TTLs are intentional — do not normalise them:\n\n| Resource | TTL | Reason |\n|---|---|---|\n| Snapshot content | 24 h | Immutable once captured |\n| Availability, CDX, sparkline | 1 h | Grows but never mutates |\n| Save operations | 30 min | Idempotent per URL |\n| Save status polling | 30 s | Changes during active jobs |\n\n## Conventions\n\n- **TypeScript strict mode** with `noUncheckedIndexedAccess` and `exactOptionalPropertyTypes` — no `any`, no `as` assertions.\n- **ES Modules throughout** — `\"type\": \"module\"` in `package.json`. Always use `.ts` extensions in relative imports (rewritten to `.js` at build time via `rewriteRelativeImportExtensions`).\n- **Zod is the single source of truth** for all tool input shapes. `schemas.ts` defines them; `zodToJsonSchema` derives the MCP-compatible JSON Schema.\n- **Prettier** formats all TypeScript: 4-space indent, double quotes, trailing commas (`es5`), 80-char print width, LF line endings.\n- **Conventional commits** are enforced by commitlint. Allowed scopes: `retrieve`, `save`, `search`, `status`, `fetch`, `http`, `validation`, `cli`, `build`, `release`, `ci`, `deps`. Commit messages must use British English.\n- **Test colocation**: unit tests in `tests/tools/*.unit.test.ts` and `tests/utils/*.unit.test.ts`; integration tests in `tests/*.integration.test.ts`. Use the Node.js built-in test runner — no Jest or Vitest.\n- **`erasableSyntaxOnly: true`** — no TypeScript syntax that cannot be stripped without transformation (no `enum`, no decorators, no `namespace`).\n\n## Gotchas\n\n- **`pnpm validate` before pushing.** CI runs `check + test + coverage + untested-files`. `pnpm validate` replicates this locally via Turbo.\n- **Turbo caches aggressively.** If you change a config file that Turbo doesn't track as an input, cached task results may be stale. Clear with `pnpm turbo run \u003ctask\u003e --force` if results look wrong.\n- **`WAYBACK_LIVE_TESTS` must be set** to run `test:e2e`. The Turbo config passes it through via `globalPassThroughEnv`; don't set it in `.env` files — export it in your shell before running.\n- **Coverage excludes** `src/contexts.ts`, `src/cli.ts`, `src/bin.ts`, and `src/tools/context.ts` — these are wiring/entry-point files. The 80% threshold applies to the remaining surface.\n- **Rate limiting is 15 req/min** across all Wayback Machine API calls, with automatic Retry-After handling for 429 responses. Tests that mock HTTP must respect this — don't call real endpoints from unit tests.\n- **`noUncheckedIndexedAccess`** means `Record\u003cstring, T\u003e` lookups return `T | undefined`. Never fall back to `?? default` — narrow explicitly or restructure to a concrete type.\n- **Node version** is pinned in `.tool-versions`. CI tests against Node 22, 24, and 26. Do not use Node APIs that aren't available in 22.\n\n## Contributing\n\nCommits follow [Conventional Commits](https://www.conventionalcommits.org/) and are lint-checked by commitlint on PRs. PRs target `main`; CI must pass (`check`, `test`, `coverage`). Releases are fully automated via semantic-release on push to `main`.\n\nAfter release, alias packages (`wayback-machine-mcp`, `mcp-internet-archive`, `internet-archive-mcp`, `@mearman/mcp-wayback-machine`) are published automatically by CI — do not publish these manually.\n\n## Installation\n\n### As an MCP server\n\n#### CLI shorthand\n\n**Claude Code (MCP):**\n\n```bash\nclaude mcp add wayback-machine -- npx -y mcp-wayback-machine\n```\n\n**Claude Code (plugin marketplace):**\n\n```bash\n/plugin marketplace add https://github.com/Mearman/mcp-wayback-machine.git\n/plugin install mcp-wayback-machine@mcp-wayback-machine\n```\n\n**OpenAI Codex:**\n\n```bash\ncodex mcp add wayback-machine -- npx -y mcp-wayback-machine\n```\n\nTo include optional credentials:\n\n```bash\nclaude mcp add wayback-machine --env WAYBACK_ACCESS_KEY=xxx --env WAYBACK_SECRET_KEY=xxx -- npx -y mcp-wayback-machine\n```\n\n#### Manual configuration\n\nAdd to the appropriate config file:\n\n```json\n{\n  \"wayback-machine\": {\n    \"command\": \"npx\",\n    \"args\": [\"-y\", \"mcp-wayback-machine\"],\n    \"env\": {\n      \"WAYBACK_ACCESS_KEY\": \"your-access-key\",\n      \"WAYBACK_SECRET_KEY\": \"your-secret-key\"\n    }\n  }\n}\n```\n\n| Harness | Config file | Config key |\n|---|---|---|\n| Claude Code | `.mcp.json` (project) / `~/.claude.json` (user) | `mcpServers` |\n| Codex | `~/.codex/config.toml` | `[mcp_servers.wayback-machine]` |\n| Gemini CLI | `~/.gemini/settings.json` | `mcpServers` |\n| Crush | `.crush.json` / `~/.config/crush/crush.json` | `mcp` |\n| Cline | `.cline/mcp.json` | `mcpServers` |\n| Cursor | `.cursor/mcp.json` | `mcpServers` |\n| Zed | `~/.config/zed/settings.json` | `context_servers` |\n| Claude Desktop | `~/Library/Application Support/Claude/claude_desktop_config.json` | `mcpServers` |\n\nThe `env` block is optional — the server works anonymously without credentials.\n\n### As a CLI tool\n\n```bash\nnpx mcp-wayback-machine save https://example.com\n```\n\nOr install globally:\n\n```bash\nnpm install -g mcp-wayback-machine\nwayback save https://example.com\n```\n\n### As a Cloudflare Worker\n\nDeploy the MCP server as a stateless Cloudflare Worker. Runs on the **free tier** with no paid bindings — all persistent state uses the [Cache API](https://developers.cloudflare.com/workers/runtime-apis/cache/) which has no published daily limits.\n\n```bash\npnpm add -D wrangler\nwrangler deploy\n```\n\nThe Worker uses the SDK's `StreamableHTTPServerTransport` in stateless mode (no session IDs), so each request is independent and cold starts are handled gracefully.\n\n**Environment variables** (set via `wrangler secret put`):\n\n| Variable | Required | Purpose |\n|---|---|---|\n| `WAYBACK_ACCESS_KEY` | No | Fallback IA S3 credentials for higher SPN2 rate limits |\n| `WAYBACK_SECRET_KEY` | No | Fallback IA S3 credentials |\n| `MCP_AUTH_TOKEN` | No | Bearer token for client authentication |\n\nWhen `MCP_AUTH_TOKEN` is set, clients must send `Authorization: Bearer \u003ctoken\u003e`. When absent, the Worker accepts unauthenticated requests.\n\n**Per-request credentials.** Clients can pass their own IA S3 credentials on each request via HTTP headers, overriding the server's environment variables:\n\n```\nX-Archive-Access-Key: \u003cyour-access-key\u003e\nX-Archive-Secret-Key: \u003cyour-secret-key\u003e\n```\n\nThis lets multiple users share a single Worker deployment while each using their own credentials for higher SPN2 rate limits. When both headers are present, they take precedence over `WAYBACK_ACCESS_KEY`/`WAYBACK_SECRET_KEY`. When absent, the Worker falls back to its environment variables.\n\n**Worker-specific files** are excluded from the main `tsconfig.json` and type-checked separately via `tsconfig.worker.json` (which adds `@cloudflare/workers-types`).\n\n**Architecture.** The stdio and Worker deployments share the same tool logic through pluggable interfaces:\n\n| Component | Stdio | Worker |\n|---|---|---|\n| Caching | `DiskCacheBackend` (OS cache dir) | `CacheApiBackend` (`caches.open()`) |\n| Rate limiting | `InMemoryRateLimiter` | `CacheApiRateLimiter` |\n| Auth | None | `StaticTokenAuthProvider` (optional) |\n| Credentials | Environment variables | Request headers, then environment variables |\n\n## Quick examples\n\n```\nArchive https://example.com to the Wayback Machine\nFind all archived snapshots of https://example.com from 2023\nWhat's the earliest archived version of https://example.com?\nCompare the oldest and newest snapshots of https://example.com\nCheck how many times https://example.com has been archived\n```\n\n## Tools\n\n### `save_url`\n\nArchive a URL to the Wayback Machine using the SPN2 API.\n\n\u003cdetails\u003e\n\u003csummary\u003eParameters\u003c/summary\u003e\n\n| Parameter | Required | Description |\n|---|---|---|\n| `url` | Yes | The URL to archive |\n| `captureScreenshot` | No | Capture a screenshot as a PNG image |\n| `captureOutlinks` | No | Also archive up to 100 outlinked pages |\n| `ifNotArchivedWithin` | No | Skip if archived within timeframe, e.g. `\"30d\"` |\n| `jsBehaviorTimeout` | No | Run JavaScript for N seconds before capturing (max 30) |\n| `forceGet` | No | Use simple HTTP GET instead of browser rendering |\n| `delayWbAvailability` | No | Delay indexing ~12 hours to reduce server load |\n\n\u003c/details\u003e\n\n### `get_archived_url`\n\nRetrieve an archived snapshot's content and metadata.\n\n\u003cdetails\u003e\n\u003csummary\u003eParameters\u003c/summary\u003e\n\n| Parameter | Required | Description |\n|---|---|---|\n| `url` | Yes | The URL to retrieve |\n| `timestamp` | No | Specific timestamp (`YYYYMMDDhhmmss`) or `\"latest\"` |\n| `modifier` | No | URL modifier: `id_` (raw), `im_` (screenshot), `js_` (JS), `cs_` (CSS) |\n\n\u003c/details\u003e\n\n### `search_archives`\n\nSearch the CDX API for archived versions of a URL.\n\n\u003cdetails\u003e\n\u003csummary\u003eParameters\u003c/summary\u003e\n\n| Parameter | Required | Description |\n|---|---|---|\n| `url` | Yes | The URL pattern to search for |\n| `matchType` | No | `exact`, `prefix`, `host`, or `domain` |\n| `from` | No | Start date (`YYYYMMDD` or `YYYY-MM-DD`) |\n| `to` | No | End date (`YYYYMMDD` or `YYYY-MM-DD`) |\n| `limit` | No | Maximum results (default 10) |\n| `offset` | No | Skip the first N results |\n| `collapse` | No | Collapse duplicates, e.g. `\"timestamp:8\"` (per hour), `\"digest\"` |\n| `filter` | No | Filter by field regex, e.g. `[\"statuscode:200\", \"!mimetype:image.*\"]` |\n| `resolveRevisits` | No | Resolve warc/revisit entries to original metadata |\n| `showDupeCount` | No | Show duplicate count per capture |\n| `page` | No | Page number for pagination |\n| `pageSize` | No | Results per page |\n\n\u003c/details\u003e\n\n### `check_archive_status`\n\nCheck archival statistics for a URL — capture counts, yearly breakdowns, and first/last capture dates.\n\n\u003cdetails\u003e\n\u003csummary\u003eParameters\u003c/summary\u003e\n\n| Parameter | Required | Description |\n|---|---|---|\n| `url` | Yes | The URL to check |\n\n\u003c/details\u003e\n\n### `list_screenshots`\n\nList available screenshots for a URL.\n\n\u003cdetails\u003e\n\u003csummary\u003eParameters\u003c/summary\u003e\n\n| Parameter | Required | Description |\n|---|---|---|\n| `url` | Yes | The URL to find screenshots for |\n| `limit` | No | Maximum results (default 10) |\n\n\u003c/details\u003e\n\n### `compare_snapshots`\n\nCompare two archived snapshots of a URL. Fetches the raw content of both and provides a visual diff URL.\n\n\u003cdetails\u003e\n\u003csummary\u003eParameters\u003c/summary\u003e\n\n| Parameter | Required | Description |\n|---|---|---|\n| `url` | Yes | The URL to compare snapshots for |\n| `timestampA` | No | First timestamp. Defaults to oldest available. |\n| `timestampB` | No | Second timestamp. Defaults to newest available. |\n\n\u003c/details\u003e\n\n### `clear_cache`\n\nClear all cached API responses. Use when fresh data is needed or after saving a new URL.\n\n## CLI usage\n\n```bash\nwayback save https://example.com\nwayback get https://example.com\nwayback get https://example.com --timestamp 20231225120000\nwayback search https://example.com --from 2023-01-01 --to 2023-12-31 --limit 20\nwayback status https://example.com\nwayback screenshots https://example.com\nwayback compare https://example.com\nwayback compare https://example.com --timestamp-a 20230101000000 --timestamp-b 20240101000000\n```\n\n## References\n\n- [Internet Archive Developer Portal](https://archive.org/developers/)\n- [CDX Server Documentation](https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server)\n- [Save Page Now 2 (SPN2) API](https://docs.google.com/document/d/1Nsv52MvSjbLb2PCpHlat0gkzw0EvtSgpKHu4mk0MnrA/)\n- [Bots, LLMs, and Automated Access](https://archive.org/developers/bots.html)\n- [internet-archive-skills](https://github.com/internetarchive/internet-archive-skills) — Official Claude Code skill for the `ia` Python CLI; complements this server.\n\n## License\n\n[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International](http://creativecommons.org/licenses/by-nc-sa/4.0/).\n\n[![CC BY-NC-SA 4.0](https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png)](http://creativecommons.org/licenses/by-nc-sa/4.0/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmearman%2Fmcp-wayback-machine","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmearman%2Fmcp-wayback-machine","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmearman%2Fmcp-wayback-machine/lists"}