https://github.com/agents365-ai/paper-fetch
Legal open-access PDF downloader by DOI — Unpaywall, arXiv, PMC, bioRxiv. Multi-platform Agent Skill.
https://github.com/agents365-ai/paper-fetch
agent-skills claude-code claude-code-skill claude-skills doi open-access openclaw openclaw-skills pdf-downloader skill-md skillsmp unpaywall
Last synced: 12 days ago
JSON representation
Legal open-access PDF downloader by DOI — Unpaywall, arXiv, PMC, bioRxiv. Multi-platform Agent Skill.
- Host: GitHub
- URL: https://github.com/agents365-ai/paper-fetch
- Owner: Agents365-ai
- Created: 2026-04-08T09:14:24.000Z (19 days ago)
- Default Branch: main
- Last Pushed: 2026-04-11T03:08:05.000Z (16 days ago)
- Last Synced: 2026-04-11T05:35:37.929Z (16 days ago)
- Topics: agent-skills, claude-code, claude-code-skill, claude-skills, doi, open-access, openclaw, openclaw-skills, pdf-downloader, skill-md, skillsmp, unpaywall
- Language: Python
- Homepage: https://agents365-ai.github.io/paper-fetch/
- Size: 44.9 KB
- Stars: 11
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# paper-fetch — Legal Open-Access PDF Downloader
[中文文档](README_CN.md)
## What it does
- Downloads paper PDFs from a **DOI** (or batch file of DOIs) via legal open-access sources
- **5-source fallback chain**: Unpaywall → Semantic Scholar `openAccessPdf` → arXiv → PubMed Central OA → bioRxiv/medRxiv
- **Zero dependencies** — pure Python standard library, no `pip install` needed
- **Auto-named output** — `{first_author}_{year}_{short_title}.pdf`
- **Batch mode** — pass a file of DOIs with `--batch`
- **Never touches Sci-Hub or any paywall-bypass service** — if no OA copy exists, reports failure with metadata so you can go through ILL
- **Self-updating** — when installed via `git clone`, each invocation spawns a detached background `git pull --ff-only` (throttled to once per 24h). Zero user action required. Disable with `export PAPER_FETCH_NO_AUTO_UPDATE=1`.
## Discipline Coverage
**The skill is discipline-agnostic** — it works for any field, not just life sciences or computer science. Coverage depends on whether the paper has a legal OA version, not on its subject area.
| Source | Discipline scope |
|---|---|
| **Unpaywall** | ✅ All disciplines (covers every Crossref DOI — humanities, social sciences, physics, chemistry, economics, etc.) |
| **Semantic Scholar** | ✅ All disciplines (cross-domain academic graph) |
| **arXiv** | Physics, math, CS, statistics, quantitative finance, economics, EE |
| **PubMed Central** | Biomedical only |
| **bioRxiv / medRxiv** | Biology / medicine preprints only |
In practice, **Unpaywall + Semantic Scholar alone cover OA papers in chemistry, materials, economics, psychology, humanities, and every other field** via institutional repositories, SSRN, RePEc, and publisher-hosted OA copies. arXiv/PMC/bioRxiv are additional fallbacks for their specific domains. If no legal OA copy exists anywhere, the skill reports failure honestly — it will **never** bypass paywalls regardless of discipline.
## Multi-Platform Support
Works with all major AI coding agents that support the Agent Skills format:
| Platform | Status | Details |
|----------|--------|---------|
| **Claude Code** | ✅ Full support | Native SKILL.md format |
| **OpenClaw / ClawHub** | ✅ Full support | `metadata.openclaw` namespace |
| **Hermes Agent** | ✅ Full support | Installable under research category |
| **[pi-mono](https://github.com/badlogic/pi-mono)** | ✅ Full support | `metadata.pimo` namespace |
| **OpenAI Codex** | ✅ Full support | `agents/openai.yaml` sidecar |
| **SkillsMP** | ✅ Indexed | GitHub topics configured |
## Comparison
### vs No Skill (native agent)
| Feature | Native agent | This skill |
|---------|-------------|------------|
| Resolve DOI to PDF | Ad-hoc web search | Deterministic 5-source chain |
| Unpaywall integration | No | Yes — highest OA coverage |
| arXiv / PMC / bioRxiv fallback | Manual | Automatic |
| Batch download | No | Yes — `--batch dois.txt` |
| Consistent filenames | No | Yes — `author_year_title.pdf` |
| Legal-only guarantee | None | Hard refuses paywall bypass |
| Dependencies | Varies | Python stdlib only |
## Prerequisites
- **Python 3.8+** (standard library only, no extra packages)
- **Unpaywall contact email** (optional but recommended) — set once:
```bash
export UNPAYWALL_EMAIL=you@example.com
```
Add it to `~/.zshrc` / `~/.bashrc` to persist. Without it, Unpaywall is skipped and the remaining 4 sources (Semantic Scholar, arXiv, PMC, bioRxiv/medRxiv) are still tried.
## Skill Installation
### Claude Code
```bash
# Global install
git clone https://github.com/Agents365-ai/paper-fetch.git ~/.claude/skills/paper-fetch
# Project-level install
git clone https://github.com/Agents365-ai/paper-fetch.git .claude/skills/paper-fetch
```
### OpenClaw / ClawHub
```bash
clawhub install paper-fetch
# Or manual
git clone https://github.com/Agents365-ai/paper-fetch.git ~/.openclaw/skills/paper-fetch
```
### Hermes Agent
```bash
git clone https://github.com/Agents365-ai/paper-fetch.git ~/.hermes/skills/research/paper-fetch
```
Or add to `~/.hermes/config.yaml`:
```yaml
skills:
external_dirs:
- ~/myskills/paper-fetch
```
### pi-mono
```bash
git clone https://github.com/Agents365-ai/paper-fetch.git ~/.pimo/skills/paper-fetch
```
### OpenAI Codex
```bash
# User-level
git clone https://github.com/Agents365-ai/paper-fetch.git ~/.agents/skills/paper-fetch
# Project-level
git clone https://github.com/Agents365-ai/paper-fetch.git .agents/skills/paper-fetch
```
### SkillsMP
```bash
skills install paper-fetch
```
### Installation paths summary
| Platform | Global path | Project path |
|----------|-------------|--------------|
| Claude Code | `~/.claude/skills/paper-fetch/` | `.claude/skills/paper-fetch/` |
| OpenClaw | `~/.openclaw/skills/paper-fetch/` | `skills/paper-fetch/` |
| Hermes Agent | `~/.hermes/skills/research/paper-fetch/` | Via `external_dirs` |
| pi-mono | `~/.pimo/skills/paper-fetch/` | — |
| OpenAI Codex | `~/.agents/skills/paper-fetch/` | `.agents/skills/paper-fetch/` |
| SkillsMP | N/A (installed via CLI) | N/A |
## Usage
Single DOI:
```bash
python scripts/fetch.py 10.1038/s41586-021-03819-2
```
Custom output directory:
```bash
python scripts/fetch.py 10.1038/s41586-021-03819-2 --out ~/papers
```
Batch mode:
```bash
cat > dois.txt < Download the AlphaFold2 paper PDF to my `~/papers` folder
> Fetch the PDF for DOI 10.1038/s41586-020-2649-2
> Download these three papers: 10.1038/s41586-021-03819-2, 10.1126/science.abj8754, 10.1101/2023.01.01.522400
> Check if this paper has an open-access PDF available: 10.1038/s41586-020-2649-2
> Batch download all DOIs from my dois.txt file into ~/papers
## Resolution Order
1. **Unpaywall** — best OA location across all publishers (highest hit rate)
2. **Semantic Scholar** — `openAccessPdf` field + `externalIds` lookup
3. **arXiv** — if the paper has an arXiv ID
4. **PubMed Central OA subset** — if the paper has a PMCID
5. **bioRxiv / medRxiv** — DOI prefix `10.1101/`
6. Otherwise → report failure with metadata (title/authors) for ILL
## Files
- `SKILL.md` — **the only required file**. Loaded by all platforms.
- `scripts/fetch.py` — the downloader (pure stdlib Python)
- `agents/openai.yaml` — OpenAI Codex sidecar configuration
- `README.md` — this file
- `README_CN.md` — Chinese documentation
## Known Limitations
- **Coverage depends on OA availability** — if a paper has no legal OA copy, this skill cannot get it. That is a feature, not a bug.
- **Some publisher redirects** return an HTML landing page instead of a PDF; the script validates the `%PDF` header and fails cleanly in that case
- **No authentication** — institutional proxies (EZproxy / OpenAthens) are not supported in this version
- **Host allowlist** — downloads are restricted to known OA provider domains; PDFs from unlisted hosts are blocked
- **50 MB size limit** — per-PDF download cap to prevent runaway downloads
## License
MIT
## Support
If this skill helps your work, consider supporting the author:
WeChat Pay
Alipay
Buy Me a Coffee
## Author
**Agents365-ai**
- Bilibili: https://space.bilibili.com/441831884
- GitHub: https://github.com/Agents365-ai