https://github.com/agents365-ai/paper-fetch

Legal open-access PDF downloader by DOI — Unpaywall, arXiv, PMC, bioRxiv. Multi-platform Agent Skill.
https://github.com/agents365-ai/paper-fetch

agent-skills claude-code claude-code-skill claude-skills doi open-access openclaw openclaw-skills pdf-downloader skill-md skillsmp unpaywall

Last synced: 3 months ago
JSON representation

Legal open-access PDF downloader by DOI — Unpaywall, arXiv, PMC, bioRxiv. Multi-platform Agent Skill.

Host: GitHub
URL: https://github.com/agents365-ai/paper-fetch
Owner: Agents365-ai
Created: 2026-04-08T09:14:24.000Z (3 months ago)
Default Branch: main
Last Pushed: 2026-04-11T03:08:05.000Z (3 months ago)
Last Synced: 2026-04-11T05:35:37.929Z (3 months ago)
Topics: agent-skills, claude-code, claude-code-skill, claude-skills, doi, open-access, openclaw, openclaw-skills, pdf-downloader, skill-md, skillsmp, unpaywall
Language: Python
Homepage: https://agents365-ai.github.io/paper-fetch/
Size: 44.9 KB
Stars: 11
Watchers: 0
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-academic-skills - paper-fetch - by [agents365-ai](https://github.com/agents365-ai) · `MIT` · `net`.<br>Resolves a DOI (or batch) to a downloadable PDF via a 7-source fallback chain (Unpaywall, Semantic Scholar, arXiv, PMC, bioRxiv, publisher, then Sci-Hub) with per-source reporting. Clean and zero-deps; the Sci-Hub fallback is a caveat. (Discover & Collect / Literature Search & Discovery)

README

          # paper-fetch — Legal Open-Access PDF Downloader

[中文文档](README_CN.md)

## What it does

- Downloads paper PDFs from a **DOI** (or batch file of DOIs) via legal open-access sources

- **5-source fallback chain**: Unpaywall → Semantic Scholar `openAccessPdf` → arXiv → PubMed Central OA → bioRxiv/medRxiv

- **Zero dependencies** — pure Python standard library, no `pip install` needed

- **Auto-named output** — `{first_author}_{year}_{short_title}.pdf`

- **Batch mode** — pass a file of DOIs with `--batch`

- **Never touches Sci-Hub or any paywall-bypass service** — if no OA copy exists, reports failure with metadata so you can go through ILL

- **Self-updating** — when installed via `git clone`, each invocation spawns a detached background `git pull --ff-only` (throttled to once per 24h). Zero user action required. Disable with `export PAPER_FETCH_NO_AUTO_UPDATE=1`.

## Discipline Coverage

**The skill is discipline-agnostic** — it works for any field, not just life sciences or computer science. Coverage depends on whether the paper has a legal OA version, not on its subject area.

| Source | Discipline scope |

|---|---|

| **Unpaywall** | ✅ All disciplines (covers every Crossref DOI — humanities, social sciences, physics, chemistry, economics, etc.) |

| **Semantic Scholar** | ✅ All disciplines (cross-domain academic graph) |

| **arXiv** | Physics, math, CS, statistics, quantitative finance, economics, EE |

| **PubMed Central** | Biomedical only |

| **bioRxiv / medRxiv** | Biology / medicine preprints only |

In practice, **Unpaywall + Semantic Scholar alone cover OA papers in chemistry, materials, economics, psychology, humanities, and every other field** via institutional repositories, SSRN, RePEc, and publisher-hosted OA copies. arXiv/PMC/bioRxiv are additional fallbacks for their specific domains. If no legal OA copy exists anywhere, the skill reports failure honestly — it will **never** bypass paywalls regardless of discipline.

## Multi-Platform Support

Works with all major AI coding agents that support the Agent Skills format:

| Platform | Status | Details |

|----------|--------|---------|

| **Claude Code** | ✅ Full support | Native SKILL.md format |

| **OpenClaw / ClawHub** | ✅ Full support | `metadata.openclaw` namespace |

| **Hermes Agent** | ✅ Full support | Installable under research category |

| **[pi-mono](https://github.com/badlogic/pi-mono)** | ✅ Full support | `metadata.pimo` namespace |

| **OpenAI Codex** | ✅ Full support | `agents/openai.yaml` sidecar |

| **SkillsMP** | ✅ Indexed | GitHub topics configured |

## Comparison

### vs No Skill (native agent)

| Feature | Native agent | This skill |

|---------|-------------|------------|

| Resolve DOI to PDF | Ad-hoc web search | Deterministic 5-source chain |

| Unpaywall integration | No | Yes — highest OA coverage |

| arXiv / PMC / bioRxiv fallback | Manual | Automatic |

| Batch download | No | Yes — `--batch dois.txt` |

| Consistent filenames | No | Yes — `author_year_title.pdf` |

| Legal-only guarantee | None | Hard refuses paywall bypass |

| Dependencies | Varies | Python stdlib only |

## Prerequisites

- **Python 3.8+** (standard library only, no extra packages)

- **Unpaywall contact email** (optional but recommended) — set once:

```bash

export UNPAYWALL_EMAIL=you@example.com

```

Add it to `~/.zshrc` / `~/.bashrc` to persist. Without it, Unpaywall is skipped and the remaining 4 sources (Semantic Scholar, arXiv, PMC, bioRxiv/medRxiv) are still tried.

## Skill Installation

### Claude Code

```bash

# Global install

git clone https://github.com/Agents365-ai/paper-fetch.git ~/.claude/skills/paper-fetch

# Project-level install

git clone https://github.com/Agents365-ai/paper-fetch.git .claude/skills/paper-fetch

```

### OpenClaw / ClawHub

```bash

clawhub install paper-fetch

# Or manual

git clone https://github.com/Agents365-ai/paper-fetch.git ~/.openclaw/skills/paper-fetch

```

### Hermes Agent

```bash

git clone https://github.com/Agents365-ai/paper-fetch.git ~/.hermes/skills/research/paper-fetch

```

Or add to `~/.hermes/config.yaml`:

```yaml

skills:

  external_dirs:

    - ~/myskills/paper-fetch

```

### pi-mono

```bash

git clone https://github.com/Agents365-ai/paper-fetch.git ~/.pimo/skills/paper-fetch

```

### OpenAI Codex

```bash

# User-level

git clone https://github.com/Agents365-ai/paper-fetch.git ~/.agents/skills/paper-fetch

# Project-level

git clone https://github.com/Agents365-ai/paper-fetch.git .agents/skills/paper-fetch

```

### SkillsMP

```bash

skills install paper-fetch

```

### Installation paths summary

| Platform | Global path | Project path |

|----------|-------------|--------------|

| Claude Code | `~/.claude/skills/paper-fetch/` | `.claude/skills/paper-fetch/` |

| OpenClaw | `~/.openclaw/skills/paper-fetch/` | `skills/paper-fetch/` |

| Hermes Agent | `~/.hermes/skills/research/paper-fetch/` | Via `external_dirs` |

| pi-mono | `~/.pimo/skills/paper-fetch/` | — |

| OpenAI Codex | `~/.agents/skills/paper-fetch/` | `.agents/skills/paper-fetch/` |

| SkillsMP | N/A (installed via CLI) | N/A |

## Usage

Single DOI:

```bash

python scripts/fetch.py 10.1038/s41586-021-03819-2

```

Custom output directory:

```bash

python scripts/fetch.py 10.1038/s41586-021-03819-2 --out ~/papers

```

Batch mode:

```bash

cat > dois.txt < Download the AlphaFold2 paper PDF to my `~/papers` folder

> Fetch the PDF for DOI 10.1038/s41586-020-2649-2

> Download these three papers: 10.1038/s41586-021-03819-2, 10.1126/science.abj8754, 10.1101/2023.01.01.522400

> Check if this paper has an open-access PDF available: 10.1038/s41586-020-2649-2

> Batch download all DOIs from my dois.txt file into ~/papers

## Resolution Order

1. **Unpaywall** — best OA location across all publishers (highest hit rate)

2. **Semantic Scholar** — `openAccessPdf` field + `externalIds` lookup

3. **arXiv** — if the paper has an arXiv ID

4. **PubMed Central OA subset** — if the paper has a PMCID

5. **bioRxiv / medRxiv** — DOI prefix `10.1101/`

6. Otherwise → report failure with metadata (title/authors) for ILL

## Files

- `SKILL.md` — **the only required file**. Loaded by all platforms.

- `scripts/fetch.py` — the downloader (pure stdlib Python)

- `agents/openai.yaml` — OpenAI Codex sidecar configuration

- `README.md` — this file

- `README_CN.md` — Chinese documentation

## Known Limitations

- **Coverage depends on OA availability** — if a paper has no legal OA copy, this skill cannot get it. That is a feature, not a bug.

- **Some publisher redirects** return an HTML landing page instead of a PDF; the script validates the `%PDF` header and fails cleanly in that case

- **No authentication** — institutional proxies (EZproxy / OpenAthens) are not supported in this version

- **Host allowlist** — downloads are restricted to known OA provider domains; PDFs from unlisted hosts are blocked

- **50 MB size limit** — per-PDF download cap to prevent runaway downloads

## License

MIT

## Support

If this skill helps your work, consider supporting the author:

  

    

      

      


      WeChat Pay

    

    

      

      


      Alipay

    

    

      

      


      Buy Me a Coffee

    

  

## Author

**Agents365-ai**

- Bilibili: https://space.bilibili.com/441831884

- GitHub: https://github.com/Agents365-ai

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/agents365-ai/paper-fetch

Awesome Lists containing this project

README