{"id":48670312,"url":"https://github.com/aytzey/paper-pilot","last_synced_at":"2026-06-01T13:00:35.602Z","repository":{"id":350263771,"uuid":"1206066974","full_name":"aytzey/paper-pilot","owner":"aytzey","description":"Your AI's research copilot. Searches 6 academic databases, downloads real PDFs, reads them cover to cover, extracts evidence, renders figures. MCP server for Claude, Codex \u0026 any AI agent. Free \u0026 open source.","archived":false,"fork":false,"pushed_at":"2026-04-10T10:33:43.000Z","size":419,"stargazers_count":1,"open_issues_count":3,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-10T12:11:40.539Z","etag":null,"topics":["academic-search","ai-agent","ai-research","arxiv","claude","codex","deep-research","literature-review","llm-tools","mcp","mcp-server","open-access","openalex","paper-download","paper-reading","pdf-reader","research","research-agent","semantic-scholar","zotero"],"latest_commit_sha":null,"homepage":"https://github.com/aytzey/Zotero-Researcher","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aytzey.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-04-09T14:42:33.000Z","updated_at":"2026-04-10T10:56:55.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/aytzey/paper-pilot","commit_stats":null,"previous_names":["aytzey/zotero-researcher","aytzey/academic-research-mcp","aytzey/paper-pilot"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/aytzey/paper-pilot","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aytzey%2Fpaper-pilot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aytzey%2Fpaper-pilot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aytzey%2Fpaper-pilot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aytzey%2Fpaper-pilot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aytzey","download_url":"https://codeload.github.com/aytzey/paper-pilot/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aytzey%2Fpaper-pilot/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33775864,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-01T02:00:06.963Z","response_time":115,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["academic-search","ai-agent","ai-research","arxiv","claude","codex","deep-research","literature-review","llm-tools","mcp","mcp-server","open-access","openalex","paper-download","paper-reading","pdf-reader","research","research-agent","semantic-scholar","zotero"],"created_at":"2026-04-10T12:05:58.629Z","updated_at":"2026-06-01T13:00:35.597Z","avatar_url":"https://github.com/aytzey.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!-- mcp-name: io.github.aytzey/paper-pilot --\u003e\n![Paper Pilot](docs/hero.svg)\n\n# Paper Pilot\n\n**Your AI's research copilot.**\n\n*An MCP server that gives Claude, Codex, and any AI agent real academic research: 6 databases, full-text PDFs, evidence with citations, figure rendering, and Zotero sync.*\n\nYour AI Googles when you say \"research.\" Paper Pilot searches real academic databases, downloads the PDFs, reads them cover to cover, renders the figures, gives you evidence with citations, and files it all in your Zotero library.\n\n[![CI](https://github.com/aytzey/paper-pilot/actions/workflows/ci.yml/badge.svg)](https://github.com/aytzey/paper-pilot/actions/workflows/ci.yml)\n[![PyPI](https://img.shields.io/pypi/v/paper-pilot)](https://pypi.org/project/paper-pilot/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)\n[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](pyproject.toml)\n[![GitHub stars](https://img.shields.io/github/stars/aytzey/paper-pilot?style=social)](https://github.com/aytzey/paper-pilot/stargazers)\n\n---\n\n![Paper Pilot in action](docs/demo.gif)\n\n---\n\n## Quick start\n\n**Try it in 30 seconds. No MCP client, no config:**\n\n```bash\n# straight from GitHub (works today):\nuvx --from git+https://github.com/aytzey/paper-pilot paper-pilot demo \"retrieval augmented generation\"\n\n# once published to PyPI:\nuvx paper-pilot demo \"retrieval augmented generation\"\n```\n\nThis searches 6 academic databases, downloads the open-access PDFs, reads them, writes a structured report, and opens an **interactive citation graph** in your browser.\n\n👉 **See a real run, no install needed:** [sample report](examples/sample-report.md) · [interactive citation graph](examples/sample-citation-graph.html)\n\n### Then plug it into your AI agent\n\nWire it into your MCP client ([setup below](#mcp-client-setup)), set a free `OPENALEX_EMAIL`, and ask:\n\n\u003e *Research retrieval-augmented generation, deep-read the top papers, and compare the methods.*\n\n---\n\n## How it works\n\n```mermaid\ngraph LR\n    A[Prompt] --\u003e B[Search 6 databases]\n    B --\u003e C[Resolve OA PDFs]\n    C --\u003e D[Download \u0026 read]\n    D --\u003e E[Extract evidence]\n    E --\u003e F[Render figures]\n    F --\u003e G[Markdown report]\n    G --\u003e H[Zotero sync]\n```\n\nOne prompt searches six academic databases, downloads the real PDFs, and returns real citations.\n\n```\nResearch retrieval-augmented generation, deep-read the top papers, and compare the methods.\n```\n\nYour AI will:\n\n1. Search **Semantic Scholar**, **OpenAlex**, **arXiv**, **Crossref**, **Europe PMC**, and **DOAJ**\n2. Find the open-access PDFs, not abstracts\n3. Download and read them cover to cover\n4. Extract evidence chunks with source attribution\n5. Give the model every PDF's local path to open on demand, and render pages as images or embed the PDF when you ask for it\n6. Write a structured Markdown report\n7. Save everything into your **Zotero** library\n\n---\n\n## vs. alternatives\n\n| | ChatGPT Deep Research | Gemini Deep Research | Perplexity Pro | **Paper Pilot** |\n|---|---|---|---|---|\n| Reads actual PDFs | Web summaries | Web summaries | Web summaries | **Full text extraction** |\n| Figures and tables | Text only | Text only | Text only | **Page rendering to PNG** |\n| Your library | Locked in their UI | Locked in Google | Locked in Perplexity | **Syncs to Zotero** |\n| Sources | Generic web search | Generic web search | Web search | **6 academic databases** |\n| Cost | $200/month | $20/month | $20/month | **Free, MIT licensed** |\n| Your data | Their cloud | Their cloud | Their cloud | **Your machine** |\n| Open source | No | No | No | **Yes** |\n\n---\n\n## MCP client setup\n\nWorks on Claude Desktop, Cursor, Claude Code, and Codex, across Windows, macOS, and Linux. Full per-OS config-file locations, the Windows `spawn uv ENOENT` fix, and a per-client capability matrix are in [docs/CLIENTS.md](docs/CLIENTS.md).\n\n### Claude Desktop\n\nAdd to `claude_desktop_config.json` (macOS: `~/Library/Application Support/Claude/`, Windows: `%APPDATA%\\Claude\\`; Claude Desktop has no Linux build, so use Claude Code on Linux):\n\n```json\n{\n  \"mcpServers\": {\n    \"paper-pilot\": {\n      \"command\": \"uv\",\n      \"args\": [\"--directory\", \"/path/to/paper-pilot\", \"run\", \"paper-pilot\"],\n      \"env\": {\n        \"OPENALEX_EMAIL\": \"you@example.com\",\n        \"UNPAYWALL_EMAIL\": \"you@example.com\",\n        \"ZOTERO_LOCAL\": \"true\",\n        \"SCIHUB_ENABLED\": \"false\"\n      }\n    }\n  }\n}\n```\n\n### Claude Code\n\n```bash\nclaude mcp add --scope user paper-pilot -- uv --directory /path/to/paper-pilot run paper-pilot\n```\n\n### Codex\n\nAdd to `~/.codex/config.toml`:\n\n```toml\n[mcp_servers.paper_pilot]\ncommand = \"uv\"\nargs = [\"--directory\", \"/path/to/paper-pilot\", \"run\", \"paper-pilot\"]\n\n[mcp_servers.paper_pilot.env]\nOPENALEX_EMAIL = \"you@example.com\"\nZOTERO_LOCAL = \"true\"\n```\n\n### Cursor\n\nPut this at `.cursor/mcp.json` (this repo) or `~/.cursor/mcp.json` (global), then enable it in Settings (`Cmd/Ctrl+Shift+J`) under Model Context Protocol. See [examples/cursor.mcp.json](examples/cursor.mcp.json).\n\n```json\n{\n  \"mcpServers\": {\n    \"paper-pilot\": {\n      \"command\": \"uv\",\n      \"args\": [\"--directory\", \"/path/to/paper-pilot\", \"run\", \"paper-pilot\"],\n      \"env\": { \"OPENALEX_EMAIL\": \"you@example.com\", \"UNPAYWALL_EMAIL\": \"you@example.com\", \"ZOTERO_LOCAL\": \"true\" }\n    }\n  }\n}\n```\n\n### Windows note\n\nClaude Desktop and Cursor spawn the command without a shell, so a bare `uv`/`uvx` can fail with `spawn uv ENOENT`. Wrap it (`\"command\": \"cmd\", \"args\": [\"/c\", \"uv\", \"--directory\", \"C:\\\\path\\\\to\\\\paper-pilot\", \"run\", \"paper-pilot\"]`) or use the full path from `where uv`.\n\n### Streamable HTTP mode\n\n```bash\npaper-pilot --transport streamable-http --host 127.0.0.1 --port 8000\n```\n\n---\n\n## Tools\n\n| Tool | What it does |\n|---|---|\n| `research_topic` | Full pipeline: search, download, report, optional citation graph + Zotero sync |\n| `deep_read_topic` | Everything above + full-text extraction with evidence chunks |\n| `graph_topic` | Render an interactive citation / relatedness graph (HTML) for a topic |\n| `render_pdf_pages` | Render PDF pages as images the model can see (figures, tables, layout) |\n| `read_pdf_document` | Return a downloaded PDF's local path and resource link (embed base64 only on request) |\n| `get_pdf_page_text` | Exact text of specific PDF pages as JSON, for fine-grained lookups (no base64) |\n| `search_literature` | Fine-grained multi-source academic search (6 databases) |\n| `find_similar_papers` | Related work expansion from a seed paper |\n| `inspect_open_access_pdf` | OA availability check and PDF preview |\n| `extract_local_pdf_text` | Text extraction from any local PDF |\n| `list_zotero_collections` | List collections in your local or web Zotero library |\n| `search_scihub` | Search Sci-Hub by DOI, title, or keyword (opt-in) |\n| `download_scihub_paper` | Download a paper via Sci-Hub by DOI (opt-in) |\n| `search_libgen` | Supplementary shadow library search (opt-in) |\n| `inspect_libgen_item` | Resolve a LibGen mirror item and preview its PDF (opt-in) |\n| `healthcheck` | Verify all connections are up |\n\n\u003e Prefer the CLI? `paper-pilot demo \"\u003ctopic\u003e\"` runs the whole pipeline and opens the citation graph. No MCP client required.\n\n---\n\n## Sci-Hub integration (opt-in)\n\nSci-Hub access is **disabled by default**. To opt in:\n\n```bash\nSCIHUB_ENABLED=true\n```\n\nOnce enabled, use `search_scihub` and `download_scihub_paper` directly, or pass `include_scihub=True` to `research_topic` / `deep_read_topic` for automatic fallback.\n\n\u003e **Disclaimer:** Sci-Hub integration is provided strictly for educational and research purposes. Users are solely responsible for compliance with applicable laws and institutional policies.\n\n---\n\n## Who uses this\n\n**PhD students** that don't want to spend a week on a literature review. Point it at your thesis topic, get back a structured comparison with real citations and the PDFs already in Zotero.\n\n**Research labs** that want to scan preprints weekly and auto-file them. Run `research_topic` on a schedule and keep your group library current.\n\n**AI builders** that need their agents to work with real academic papers instead of web scraping snippets.\n\n---\n\n## Configuration\n\n```bash\nOPENALEX_EMAIL=you@example.com        # Required for polite API access\nUNPAYWALL_EMAIL=you@example.com       # Required for OA resolution\nSEMANTIC_SCHOLAR_API_KEY=             # Optional, higher rate limits\n\n# Local Zotero\nZOTERO_LOCAL=true\nZOTERO_LIBRARY_TYPE=user\nZOTERO_DATA_DIR=                       # optional: relocated/sandboxed Zotero data dir (default ~/Zotero)\n\n# Web Zotero API (alternative)\nZOTERO_LIBRARY_ID=\nZOTERO_API_KEY=\n\n# Sci-Hub (disabled by default)\nSCIHUB_ENABLED=false\nINSECURE_SHADOW_TLS=false              # opt in to skip TLS verification for Sci-Hub/LibGen mirrors\n\n# Storage\nPAPER_PILOT_DATA_DIR=./data\nMAX_DOWNLOAD_MB=75                     # per-PDF download size cap\nPAPER_PILOT_ALLOW_EXTERNAL_PDF=true   # read PDFs outside the data dir (set false on networked transports)\nPDF_EMBED_MAX_MB=5                     # size cap for an embedded PDF resource\nPDF_EMBED_MAX_PAGES=60                 # page cap for an embedded PDF resource\n\n# Institutional networks\nHTTP_PROXY=\nHTTPS_PROXY=\nSSL_CERT_FILE=\n```\n\n---\n\n## Project structure\n\n```\nsrc/paper_pilot/\n  server.py              MCP tools and pipeline orchestration\n  cli.py                 Server entry point + `demo` subcommand\n  demo.py                Zero-config one-command demo runner\n  config.py              Environment and settings\n  services/\n    academic.py          Multi-source scholarly search (6 databases)\n    open_access.py       OA resolution and PDF downloads\n    scihub.py            Sci-Hub paper resolution (opt-in)\n    deep_read.py         Full-text extraction and page rendering\n    zotero.py            Local and web Zotero integration\n    reporting.py         Markdown report + synthesis comparison tables\n    graphing.py          Interactive citation-graph HTML export\n    content.py           PDF/image MCP content blocks (pages as images, embedded PDF)\n    libgen.py            Supplementary LibGen support\n    net.py               SSRF guard + size-capped downloads\n```\n\nArchitecture details: [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md)\n\n---\n\n## For AI agents\n\n- [AGENTS.md](AGENTS.md): shared operating guide\n- [CLAUDE.md](CLAUDE.md): Claude Desktop and Claude Code setup\n- [CODEX.md](CODEX.md): Codex setup\n- [docs/CLIENTS.md](docs/CLIENTS.md): side-by-side client comparison\n\n---\n\n## Contributing\n\nPRs welcome. The most impactful areas:\n\n- New scholarly source adapters\n- Better OA resolution logic\n- PDF parsing improvements\n- More MCP client configs\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md).\n\n---\n\n## Disclaimer\n\nThis tool is designed for academic research and educational purposes only. Open-access features use only legal, publicly available sources. Sci-Hub and LibGen integrations are disabled by default and provided as opt-in features.\n\n---\n\n## License\n\nMIT. Do whatever you want with it.\n\nIf this helps your research, [star the repo](https://github.com/aytzey/paper-pilot) and tell a colleague.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faytzey%2Fpaper-pilot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faytzey%2Fpaper-pilot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faytzey%2Fpaper-pilot/lists"}