{"id":49296328,"url":"https://github.com/chad-loder/pyhaul","last_synced_at":"2026-05-02T12:05:53.480Z","repository":{"id":353880968,"uuid":"1221282637","full_name":"chad-loder/pyhaul","owner":"chad-loder","description":"Resumable, cursor-based, CDN-safe HTTP downloads for Python","archived":false,"fork":false,"pushed_at":"2026-04-26T02:04:06.000Z","size":145,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-26T03:32:58.149Z","etag":null,"topics":["download","http","httpx","python","requests","resumable"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chad-loder.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":"chad-loder"}},"created_at":"2026-04-26T01:48:34.000Z","updated_at":"2026-04-26T02:04:02.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/chad-loder/pyhaul","commit_stats":null,"previous_names":["chad-loder/pyhaul"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/chad-loder/pyhaul","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chad-loder%2Fpyhaul","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chad-loder%2Fpyhaul/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chad-loder%2Fpyhaul/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chad-loder%2Fpyhaul/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chad-loder","download_url":"https://codeload.github.com/chad-loder/pyhaul/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chad-loder%2Fpyhaul/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32285283,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-25T18:29:39.964Z","status":"online","status_checked_at":"2026-04-26T02:00:05.962Z","response_time":129,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["download","http","httpx","python","requests","resumable"],"created_at":"2026-04-26T04:00:23.677Z","updated_at":"2026-05-02T12:05:53.461Z","avatar_url":"https://github.com/chad-loder.png","language":"Python","funding_links":["https://github.com/sponsors/chad-loder"],"categories":[],"sub_categories":[],"readme":"# pyhaul\n\n[![CI](https://github.com/chad-loder/pyhaul/actions/workflows/ci.yml/badge.svg?event=push)](https://github.com/chad-loder/pyhaul/actions/workflows/ci.yml)\n[![codecov](https://codecov.io/gh/chad-loder/pyhaul/graph/badge.svg)](https://codecov.io/gh/chad-loder/pyhaul)\n[![PyPI](https://img.shields.io/pypi/v/pyhaul.svg)](https://pypi.org/project/pyhaul/)\n[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)\n[![Docs](https://img.shields.io/badge/docs-properdocs-blue.svg)](https://chad-loder.github.io/pyhaul/)\n\nResumable HTTP downloads for Python. **Bring your own client:** pyhaul borrows your existing\nsession and handles byte-range negotiation, crash-safe checkpointing, and validation.\n\n[![httpx](https://img.shields.io/badge/httpx-async%2Bsync-6B46C1.svg)](https://www.python-httpx.org/)\n[![niquests](https://img.shields.io/badge/niquests-async%2Bsync-6B46C1.svg)](https://niquests.readthedocs.io/)\n[![aiohttp](https://img.shields.io/badge/aiohttp-async-2563EB.svg)](https://docs.aiohttp.org/)\n[![requests](https://img.shields.io/badge/requests-sync-059669.svg)](https://requests.readthedocs.io/)\n[![urllib3](https://img.shields.io/badge/urllib3-sync-059669.svg)](https://urllib3.readthedocs.io/)\n\n```bash\npip install pyhaul[httpx]   # or: niquests, requests, urllib3, aiohttp\n```\n\n```python\nimport httpx\nfrom pathlib import Path\nfrom pyhaul import haul, PartialHaulError\n\ndest = Path(\"big.zip\")\nwith httpx.Client() as client:\n    for _ in range(10):\n        try:\n            result = haul(\"https://example.com/big.zip\", client, dest=dest)\n            break\n        except PartialHaulError:\n            pass  # only retryable error; others propagate\n\nprint(f\"done: {dest.stat().st_size:,} bytes\")\n```\n\n---\n\n## What is it?\n\nA small, pure-Python library that makes HTTP downloads **resumable**.\nTo download a file, call `haul()` with a URL, your existing HTTP\nclient, and a destination path. **pyhaul** handles byte-range\nnegotiation for resume, ETag validation, crash-safe\ncheckpointing, and atomic file completion. Supports both sync and\nasync across multiple HTTP client libraries.\n\nEach call to `haul()` upholds these guarantees:\n\n- **One `haul()` makes one request**. You are responsible for\n  retry loops, but retry just means call `haul()` again.\n- **The destination file will not exist until download is complete.**\n  There is no state where a partially-written file sits at the final\n  path. Incomplete data lives in a temporary `.part` file; on completion\n  it is atomically moved into place.\n- **Interrupted downloads resume when possible.** Checkpoint state\n  lives on disk, not in memory. Kill the process, lose the network,\n  get a 503 — the next `haul()` picks up from the last durable\n  byte. Zero re-downloaded data if the resource hasn't changed.\n- **If the remote resource changes, retry will not corrupt.** If\n  the remote file changes between attempts, `pyhaul` detects the\n  mismatch via ETag (a server-side fingerprint) and starts over\n  cleanly instead of gluing mismatched halves together.\n- **Your HTTP client is borrowed, not owned.** `pyhaul` sets\n  per-request headers and returns your session untouched. It never\n  creates, configures, or closes sessions.\n- **Transport errors pass through unwrapped.** `httpx.ReadTimeout`\n  stays `httpx.ReadTimeout`. You catch the types you already know.\n\n## How it fits into your code\n\nOne `haul()` = one HTTP request. It either succeeds and returns\n`CompleteHaul`, or it throws — possibly after saving progress\nto a `.part` file that allows the next call to resume. `pyhaul` never\ncreates sessions, connections, or clients. Your HTTP library's native\nexceptions propagate through unwrapped, so you can drop `haul()`\ninto existing code without changing your error handling. Retries are\nyour call — a for-loop, `tenacity`, or nothing. Concurrency limiting\n(e.g. `asyncio.Semaphore`) is also yours — `pyhaul` downloads one\nfile per call and doesn't manage parallelism.\n\n```python\ndef haul(url, client, *, dest) -\u003e CompleteHaul: ...\nasync def haul_async(url, client, *, dest) -\u003e CompleteHaul: ...\n```\n\nOptional `HaulState` (progress bag, updated in-place) and other keyword-only\noptions (extra headers, progress hooks, buffer sizing) are documented on the\nsite. See\n[docs/DESIGN.md](docs/DESIGN.md) for the exception hierarchy, transport\nadapters, and download lifecycle.\n\n## Documentation\n\n**[Full documentation →](https://chad-loder.github.io/pyhaul/)**\n\n- **[docs/DESIGN.md](docs/DESIGN.md)** — Transport adapters, checkpoint state, and the download lifecycle.\n- **[docs/WHY.md](docs/WHY.md)** — Silent failure modes in HTTP range/resume, and how pyhaul compares\n  to `curl`, `wget`, and `aria2c`.\n- **[docs/SPEC.md](docs/SPEC.md)** — Control file and checkpoint format (implementers / compatible tools).\n\n\u003c!-- pypi-end --\u003e\n\n## Examples\n\n\u003c!-- source: examples/example_httpx_sync.py --\u003e\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eSync with retries (httpx)\u003c/strong\u003e\u003c/summary\u003e\n\n```python\nimport time\nfrom pathlib import Path\n\nimport httpx\n\nfrom pyhaul import PartialHaulError, HaulState, haul\n\nurl = \"https://example.com/big.iso\"\ndest = Path(\"big.iso\")\nstate = HaulState()  # optional — tracks byte-level progress\n\nwith httpx.Client() as client:\n    for attempt in range(1, 11):\n        try:\n            result = haul(url, client, dest=dest, state=state)\n            print(f\"done: {state.valid_length:,} bytes, sha256={result.sha256[:16]}…\")\n            break\n        except PartialHaulError as exc:\n            print(f\"attempt {attempt}: {exc.reason} ({state.valid_length:,} bytes so far)\")\n            time.sleep(min(2**attempt, 30))\n```\n\n\u003c/details\u003e\n\n\u003c!-- source: examples/example_httpx_async.py --\u003e\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eAsync concurrent downloads (httpx + tenacity)\u003c/strong\u003e\u003c/summary\u003e\n\n```python\nimport asyncio\nfrom pathlib import Path\n\nimport httpx\nfrom tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_exponential_jitter\n\nfrom pyhaul import PartialHaulError, haul_async\n\nURLS = [\n    (\"https://example.com/data/shard-001.bin\", Path(\"downloads/shard-001.bin\")),\n    (\"https://example.com/data/shard-002.bin\", Path(\"downloads/shard-002.bin\")),\n    (\"https://example.com/data/shard-003.bin\", Path(\"downloads/shard-003.bin\")),\n]\n\n\n@retry(\n    retry=retry_if_exception_type(PartialHaulError),\n    wait=wait_exponential_jitter(initial=2, max=30),\n    stop=stop_after_attempt(10),\n)\nasync def download_one(client: httpx.AsyncClient, url: str, dest: Path) -\u003e None:\n    await haul_async(url, client, dest=dest)\n\n\nasync def main() -\u003e None:\n    Path(\"downloads\").mkdir(exist_ok=True)\n    async with httpx.AsyncClient() as client, asyncio.TaskGroup() as tg:\n        for url, dest in URLS:\n            tg.create_task(download_one(client, url, dest))\n\n\nasyncio.run(main())\n```\n\nEach `haul_async` call manages its own checkpoint independently.\nA crash partway through leaves each file in a separately resumable\nstate.\n\n\u003c/details\u003e\n\n\u003c!-- See doc_todo.md for future README section ideas. --\u003e\n\n## Why this exists\n\nYou probably already know that resuming an HTTP download isn't just\n`Range: bytes=N-`. The full list of silent failure modes is longer\nthan most people expect — servers that return 200 instead of 206,\nresources that change between retries (`curl -C -` and `aria2c` both\nmiss this), compression that corrupts resumed streams, and ordering\nguarantees needed for crash safety. See [docs/WHY.md](docs/WHY.md) for the\ndeep-dive and a comparison with `curl`, `wget`, and `aria2c`.\n\n## Install\n\nExtras match the badges above. In full:\n\n```bash\npip install pyhaul[httpx]      # httpx (sync + async)\npip install pyhaul[niquests]   # niquests (HTTP/2+3, async)\npip install pyhaul[requests]   # if you already use requests\npip install pyhaul[urllib3]    # raw urllib3\npip install pyhaul[aiohttp]    # aiohttp (async)\n```\n\nNo hard dependency on any HTTP library. Pick one (or several) as extras.\n\n---\n\n## Development\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for branches, commit style, and full tooling.\n\n```bash\ngit clone https://github.com/chad-loder/pyhaul.git \u0026\u0026 cd pyhaul\nuv sync --all-groups\nuv run pytest\njust lint        # ruff + mypy + pyright + rumdl\n```\n\n## License\n\nMIT. See the `LICENSE` file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchad-loder%2Fpyhaul","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchad-loder%2Fpyhaul","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchad-loder%2Fpyhaul/lists"}