https://github.com/chad-loder/pyhaul

Resumable, cursor-based, CDN-safe HTTP downloads for Python
https://github.com/chad-loder/pyhaul

download http httpx python requests resumable

Last synced: 3 months ago
JSON representation

Resumable, cursor-based, CDN-safe HTTP downloads for Python

Host: GitHub
URL: https://github.com/chad-loder/pyhaul
Owner: chad-loder
License: mit
Created: 2026-04-26T01:48:34.000Z (3 months ago)
Default Branch: main
Last Pushed: 2026-04-26T02:04:06.000Z (3 months ago)
Last Synced: 2026-04-26T03:32:58.149Z (3 months ago)
Topics: download, http, httpx, python, requests, resumable
Language: Python
Size: 142 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md

Awesome Lists containing this project

README

          # pyhaul

[![CI](https://github.com/chad-loder/pyhaul/actions/workflows/ci.yml/badge.svg?event=push)](https://github.com/chad-loder/pyhaul/actions/workflows/ci.yml)

[![codecov](https://codecov.io/gh/chad-loder/pyhaul/graph/badge.svg)](https://codecov.io/gh/chad-loder/pyhaul)

[![PyPI](https://img.shields.io/pypi/v/pyhaul.svg)](https://pypi.org/project/pyhaul/)

[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)

[![Docs](https://img.shields.io/badge/docs-properdocs-blue.svg)](https://chad-loder.github.io/pyhaul/)

Resumable HTTP downloads for Python. **Bring your own client:** pyhaul borrows your existing

session and handles byte-range negotiation, crash-safe checkpointing, and validation.

[![httpx](https://img.shields.io/badge/httpx-async%2Bsync-6B46C1.svg)](https://www.python-httpx.org/)

[![niquests](https://img.shields.io/badge/niquests-async%2Bsync-6B46C1.svg)](https://niquests.readthedocs.io/)

[![aiohttp](https://img.shields.io/badge/aiohttp-async-2563EB.svg)](https://docs.aiohttp.org/)

[![requests](https://img.shields.io/badge/requests-sync-059669.svg)](https://requests.readthedocs.io/)

[![urllib3](https://img.shields.io/badge/urllib3-sync-059669.svg)](https://urllib3.readthedocs.io/)

```bash

pip install pyhaul[httpx]   # or: niquests, requests, urllib3, aiohttp

```

```python

import httpx

from pathlib import Path

from pyhaul import haul, PartialHaulError

dest = Path("big.zip")

with httpx.Client() as client:

    for _ in range(10):

        try:

            result = haul("https://example.com/big.zip", client, dest=dest)

            break

        except PartialHaulError:

            pass  # only retryable error; others propagate

print(f"done: {dest.stat().st_size:,} bytes")

```

---

## What is it?

A small, pure-Python library that makes HTTP downloads **resumable**.

To download a file, call `haul()` with a URL, your existing HTTP

client, and a destination path. **pyhaul** handles byte-range

negotiation for resume, ETag validation, crash-safe

checkpointing, and atomic file completion. Supports both sync and

async across multiple HTTP client libraries.

Each call to `haul()` upholds these guarantees:

- **One `haul()` makes one request**. You are responsible for

  retry loops, but retry just means call `haul()` again.

- **The destination file will not exist until download is complete.**

  There is no state where a partially-written file sits at the final

  path. Incomplete data lives in a temporary `.part` file; on completion

  it is atomically moved into place.

- **Interrupted downloads resume when possible.** Checkpoint state

  lives on disk, not in memory. Kill the process, lose the network,

  get a 503 — the next `haul()` picks up from the last durable

  byte. Zero re-downloaded data if the resource hasn't changed.

- **If the remote resource changes, retry will not corrupt.** If

  the remote file changes between attempts, `pyhaul` detects the

  mismatch via ETag (a server-side fingerprint) and starts over

  cleanly instead of gluing mismatched halves together.

- **Your HTTP client is borrowed, not owned.** `pyhaul` sets

  per-request headers and returns your session untouched. It never

  creates, configures, or closes sessions.

- **Transport errors pass through unwrapped.** `httpx.ReadTimeout`

  stays `httpx.ReadTimeout`. You catch the types you already know.

## How it fits into your code

One `haul()` = one HTTP request. It either succeeds and returns

`CompleteHaul`, or it throws — possibly after saving progress

to a `.part` file that allows the next call to resume. `pyhaul` never

creates sessions, connections, or clients. Your HTTP library's native

exceptions propagate through unwrapped, so you can drop `haul()`

into existing code without changing your error handling. Retries are

your call — a for-loop, `tenacity`, or nothing. Concurrency limiting

(e.g. `asyncio.Semaphore`) is also yours — `pyhaul` downloads one

file per call and doesn't manage parallelism.

```python

def haul(url, client, *, dest) -> CompleteHaul: ...

async def haul_async(url, client, *, dest) -> CompleteHaul: ...

```

Optional `HaulState` (progress bag, updated in-place) and other keyword-only

options (extra headers, progress hooks, buffer sizing) are documented on the

site. See

[docs/DESIGN.md](docs/DESIGN.md) for the exception hierarchy, transport

adapters, and download lifecycle.

## Documentation

**[Full documentation →](https://chad-loder.github.io/pyhaul/)**

- **[docs/DESIGN.md](docs/DESIGN.md)** — Transport adapters, checkpoint state, and the download lifecycle.

- **[docs/WHY.md](docs/WHY.md)** — Silent failure modes in HTTP range/resume, and how pyhaul compares

  to `curl`, `wget`, and `aria2c`.

- **[docs/SPEC.md](docs/SPEC.md)** — Control file and checkpoint format (implementers / compatible tools).

## Examples

Sync with retries (httpx)

```python

import time

from pathlib import Path

import httpx

from pyhaul import PartialHaulError, HaulState, haul

url = "https://example.com/big.iso"

dest = Path("big.iso")

state = HaulState()  # optional — tracks byte-level progress

with httpx.Client() as client:

    for attempt in range(1, 11):

        try:

            result = haul(url, client, dest=dest, state=state)

            print(f"done: {state.valid_length:,} bytes, sha256={result.sha256[:16]}…")

            break

        except PartialHaulError as exc:

            print(f"attempt {attempt}: {exc.reason} ({state.valid_length:,} bytes so far)")

            time.sleep(min(2**attempt, 30))

```

Async concurrent downloads (httpx + tenacity)

```python

import asyncio

from pathlib import Path

import httpx

from tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_exponential_jitter

from pyhaul import PartialHaulError, haul_async

URLS = [

    ("https://example.com/data/shard-001.bin", Path("downloads/shard-001.bin")),

    ("https://example.com/data/shard-002.bin", Path("downloads/shard-002.bin")),

    ("https://example.com/data/shard-003.bin", Path("downloads/shard-003.bin")),

]

@retry(

    retry=retry_if_exception_type(PartialHaulError),

    wait=wait_exponential_jitter(initial=2, max=30),

    stop=stop_after_attempt(10),

)

async def download_one(client: httpx.AsyncClient, url: str, dest: Path) -> None:

    await haul_async(url, client, dest=dest)

async def main() -> None:

    Path("downloads").mkdir(exist_ok=True)

    async with httpx.AsyncClient() as client, asyncio.TaskGroup() as tg:

        for url, dest in URLS:

            tg.create_task(download_one(client, url, dest))

asyncio.run(main())

```

Each `haul_async` call manages its own checkpoint independently.

A crash partway through leaves each file in a separately resumable

state.

## Why this exists

You probably already know that resuming an HTTP download isn't just

`Range: bytes=N-`. The full list of silent failure modes is longer

than most people expect — servers that return 200 instead of 206,

resources that change between retries (`curl -C -` and `aria2c` both

miss this), compression that corrupts resumed streams, and ordering

guarantees needed for crash safety. See [docs/WHY.md](docs/WHY.md) for the

deep-dive and a comparison with `curl`, `wget`, and `aria2c`.

## Install

Extras match the badges above. In full:

```bash

pip install pyhaul[httpx]      # httpx (sync + async)

pip install pyhaul[niquests]   # niquests (HTTP/2+3, async)

pip install pyhaul[requests]   # if you already use requests

pip install pyhaul[urllib3]    # raw urllib3

pip install pyhaul[aiohttp]    # aiohttp (async)

```

No hard dependency on any HTTP library. Pick one (or several) as extras.

---

## Development

See [CONTRIBUTING.md](CONTRIBUTING.md) for branches, commit style, and full tooling.

```bash

git clone https://github.com/chad-loder/pyhaul.git && cd pyhaul

uv sync --all-groups

uv run pytest

just lint        # ruff + mypy + pyright + rumdl

```

## License

MIT. See the `LICENSE` file for details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/chad-loder/pyhaul

Awesome Lists containing this project

README