{"id":46597457,"url":"https://github.com/christianhelle/argiope","last_synced_at":"2026-03-07T15:01:45.044Z","repository":{"id":341600380,"uuid":"1170776570","full_name":"christianhelle/argiope","owner":"christianhelle","description":"A web crawler for broken-link detection and image downloading","archived":false,"fork":false,"pushed_at":"2026-03-02T14:27:59.000Z","size":73,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-02T17:48:42.962Z","etag":null,"topics":["web-crawler","zig"],"latest_commit_sha":null,"homepage":"","language":"Zig","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/christianhelle.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-02T14:05:36.000Z","updated_at":"2026-03-02T14:28:03.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/christianhelle/argiope","commit_stats":null,"previous_names":["christianhelle/argiope"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/christianhelle/argiope","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/christianhelle%2Fargiope","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/christianhelle%2Fargiope/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/christianhelle%2Fargiope/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/christianhelle%2Fargiope/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/christianhelle","download_url":"https://codeload.github.com/christianhelle/argiope/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/christianhelle%2Fargiope/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30219257,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-07T14:02:48.375Z","status":"ssl_error","status_checked_at":"2026-03-07T14:02:43.192Z","response_time":53,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["web-crawler","zig"],"created_at":"2026-03-07T15:01:44.292Z","updated_at":"2026-03-07T15:01:45.036Z","avatar_url":"https://github.com/christianhelle.png","language":"Zig","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![CI](https://github.com/christianhelle/argiope/actions/workflows/ci.yml/badge.svg)](https://github.com/christianhelle/argiope/actions/workflows/ci.yml)\n[![Release](https://github.com/christianhelle/argiope/actions/workflows/release.yml/badge.svg)](https://github.com/christianhelle/argiope/actions/workflows/release.yml)\n\n# argiope\n\nA web crawler for broken-link detection and image downloading, written in [Zig](https://ziglang.org/).\n\n## Features\n\n- Crawl websites and detect broken links (4xx/5xx/timeout)\n- Generate reports in text, Markdown, or HTML format\n- Download images from web pages to organized directories\n- Generate portable HTML browsing pages for downloaded image trees (`library.html`, nested `index.html`, and `reader.html`)\n- Download manga chapters from [MangaFox (fanfox.net)](https://fanfox.net) by title, with optional chapter range filtering\n- BFS traversal with configurable depth, timeouts, and rate limiting\n- Domain-restricted crawling with same-origin checks\n- Lightweight HTML scanner for link and image extraction\n- URL normalization and relative-to-absolute resolution\n- Zero external dependencies — uses only `std`\n- Single static binary — no runtime needed\n- Cross-platform: Linux, macOS, Windows\n\n## Installation\n\n### Snap\n\n```sh\nsudo snap install argiope\n```\n\n### Download from GitHub Releases\n\nPre-built binaries for Linux (x86_64, aarch64), macOS (x86_64, aarch64), and Windows (x86_64) are available on the [Releases](https://github.com/christianhelle/argiope/releases) page.\n\n### Build from source\n\nRequires [Zig 0.15.2+](https://ziglang.org/download/):\n\n```sh\nzig build -Doptimize=ReleaseFast\n```\n\nThe binary is at `zig-out/bin/argiope`.\n\nRelease automation keeps `snapcraft.yaml` and `src/cli.zig` aligned so tagged release builds publish matching package and CLI versions.\n\n## Usage\n\n### Check for broken links\n\n```sh\nargiope check https://example.com\nargiope check https://example.com --depth 5 --timeout 15\n```\n\nOutput includes a list of broken links with status codes and a summary:\n\n```text\nCrawling https://example.com (depth=3, timeout=10s)...\n\n----------------------------------------------------------------------------------------\nStatus   Type       Time(ms)   URL\n----------------------------------------------------------------------------------------\n404      internal   45         https://example.com/missing-page\ntimeout  external   10001      https://dead-link.example.org/page\n----------------------------------------------------------------------------------------\n\nSummary:\n  Total URLs checked: 42\n  OK:                 40\n  Broken:             1\n  Errors:             1\n  Internal:           30\n  External:           12\n\nTiming:\n  Total crawl time:   523ms\n  Avg response time:  12ms\n  Min response time:  5ms\n  Max response time:  10001ms\n```\n\n### Download images\n\n```sh\nargiope images https://example.com/gallery -o ./images\nargiope images https://manga-site.com/title --depth 2 -o ./manga\n```\n\nImages are saved to `output_dir/page_N/image_N.ext` where the extension is derived from the source URL. After downloads finish, argiope also generates a portable HTML browser rooted at `output_dir/library.html`, plus nested `index.html` and per-folder `reader.html` pages for thumbnails and ordered reading.\n\nThe generated browser works for both generic downloads and MangaFox chapter trees, keeps links relative and percent-encodes folder/file names for local file browsing, and includes light / dark / system theme controls with a `localStorage`-backed preference (default: system).\n\n### Download manga from MangaFox\n\nPass a [fanfox.net](https://fanfox.net) manga URL to the `images` command. Chapter pages are downloaded automatically and saved as `[output_dir]/[manga-title]/[chapter]/[page].jpg`, and the same HTML browser is generated across the manga folder tree for scalable chapter navigation.\n\n```sh\n# Download all chapters\nargiope images https://fanfox.net/manga/naruto -o ./manga\n\n# Download a specific range of chapters\nargiope images https://fanfox.net/manga/naruto --chapters 1-10 -o ./manga\n```\n\n**Chapter detection:** The tool fetches the manga's RSS feed (`https://fanfox.net/rss/{slug}.xml`) as the primary chapter source. This reliably detects all chapters, including those on manga titles where the chapter list is loaded dynamically via JavaScript (which static HTML parsing cannot see). If the RSS feed is unavailable or empty, the tool automatically falls back to HTML parsing.\n\n**Chapter ordering:** Chapters are always downloaded in numeric order (1, 2, 10, 11, 100), not alphabetic order. Decimal chapter numbers (e.g., 5.5, 100.1) are fully supported and sorted correctly between their integer neighbors.\n\n**Troubleshooting:** If chapters are missing or not detected, use `--verbose` to see detailed chapter discovery information:\n\n```sh\nargiope images https://fanfox.net/manga/title --verbose\n```\n\nThis will show all chapters found and the order they will be downloaded in.\n\n### Browse downloaded images in HTML\n\nOpen the generated root landing page after an `images` run:\n\n```sh\nxdg-open ./images/library.html\n```\n\nEach folder with downloaded images gets:\n\n- `index.html` for nested navigation and thumbnail overviews\n- `reader.html` for ordered prev/next viewing inside that folder\n- theme controls for **System**, **Light**, and **Dark**, persisted in `localStorage`\n\nThis scales from the generic `page_N/` layout to deep MangaFox trees such as `slug/chapter/page.jpg`.\n\n### Regenerate HTML browser for existing images\n\nIf you already have a directory of downloaded images and want to regenerate the HTML browser:\n\n```sh\nargiope library ./images\n```\n\nThis is useful if you've manually moved or reorganized images, or want to update the browser UI after an upgrade.\n\n### Verbose Mode\n\nFor detailed progress output while crawling:\n\n```sh\nargiope check https://example.com --verbose\n```\n\nEach URL will be printed as it is checked, showing the crawling progress in real-time.\n\n### Parallel Crawling\n\nFor faster crawling on sites with many links, enable parallel crawling:\n\n```sh\nargiope check https://example.com --parallel\n```\n\nThis crawls multiple URLs concurrently for improved performance.\n\n### Generate reports\n\nWrite the results to a file instead of printing to the terminal. In report mode all console output is suppressed, making it suitable for CI pipelines and LLM-based workflows.\n\n```sh\n# Text report (default)\nargiope check https://example.com --report report.txt\n\n# Markdown report\nargiope check https://example.com --report report.md --report-format markdown\n\n# HTML report (self-contained, no external dependencies)\nargiope check https://example.com --report report.html --report-format html\n```\n\nBy default only broken links appear in the report. Add `--include-positives` to include all successfully resolved links as well:\n\n```sh\nargiope check https://example.com --report report.md --report-format markdown --include-positives\n```\n\n#### Report formats\n\n| Format | Description |\n|--------|-------------|\n| `text` | Plain-text list with indented type/timing detail per entry (default) |\n| `markdown` | GitHub-Flavored Markdown bullet list — suitable for PR comments or wikis |\n| `html` | Self-contained HTML file with inline CSS, card layout, and pill badges |\n\n### Options\n\n```text\nUsage: argiope \u003ccommand\u003e [options]\n\nCommands:\n  check \u003curl\u003e           Crawl a website and report broken links\n  images \u003curl\u003e          Download images from a website\n  library \u003cdir\u003e         Generate HTML browser for an existing directory\n\nOptions:\n  --depth N             Maximum crawl depth (default: 3)\n  --timeout N           Request timeout in seconds (default: 10)\n  --delay N             Delay between requests in ms (default: 100)\n  -o, --output DIR      Output directory for downloads (default: ./download)\n  --chapters N-M        Chapter range to download, e.g. --chapters 1-10 (fanfox.net only)\n  --verbose             Print progress for each URL as it is crawled\n  --parallel            Crawl URLs in parallel for better performance\n  --report \u003cfile\u003e       Write a report to \u003cfile\u003e\n  --report-format \u003cfmt\u003e Report format: text (default), markdown, html\n  --include-positives   Include successful links in the report\n  -h, --help            Show help\n  -v, --version         Show version\n```\n\n## Development\n\n```sh\n# Build\nzig build\n\n# Run tests\nzig build test\n\n# Build release\nzig build -Doptimize=ReleaseFast\n\n# Or use Make\nmake build\nmake test\nmake clean\n```\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchristianhelle%2Fargiope","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchristianhelle%2Fargiope","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchristianhelle%2Fargiope/lists"}