{"id":27607914,"url":"https://github.com/barttc/siteprobe","last_synced_at":"2026-01-20T00:01:26.161Z","repository":{"id":281830824,"uuid":"946455577","full_name":"bartTC/siteprobe","owner":"bartTC","description":"Sitemap Validation and Performance Analyzer","archived":false,"fork":false,"pushed_at":"2026-01-01T17:32:47.000Z","size":1700,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-01-07T00:07:36.050Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://barttc.github.io/siteprobe/","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bartTC.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-03-11T07:00:15.000Z","updated_at":"2026-01-01T17:32:42.000Z","dependencies_parsed_at":"2025-07-04T17:25:56.354Z","dependency_job_id":"06df984d-eef8-4486-b9d2-f669e3639eb5","html_url":"https://github.com/bartTC/siteprobe","commit_stats":null,"previous_names":["barttc/siteprobe"],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/bartTC/siteprobe","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bartTC%2Fsiteprobe","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bartTC%2Fsiteprobe/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bartTC%2Fsiteprobe/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bartTC%2Fsiteprobe/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bartTC","download_url":"https://codeload.github.com/bartTC/siteprobe/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bartTC%2Fsiteprobe/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28590676,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-19T23:59:00.777Z","status":"ssl_error","status_checked_at":"2026-01-19T23:58:54.030Z","response_time":67,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-04-22T22:17:35.702Z","updated_at":"2026-01-20T00:01:26.144Z","avatar_url":"https://github.com/bartTC.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Siteprobe\n\nSiteprobe is a Rust-based CLI tool that fetches all URLs from a given `sitemap.xml`\nurl, checks their existence, and generates a performance report. It supports various\nfeatures such as authentication, concurrency control, caching bypass, and more.\n\n![Screenshot of Siteprobe statistics](https://github.com/bartTC/siteprobe/blob/main/docs/screenshot.png?raw=true)\n\n## Features\n\n- Fetch and parse sitemap.xml to extract URLs, including nested Sitemap Index files\n  recursively.\n- Check the existence and response times of each URL.\n- Generate a detailed performance CSV report.\n- Support for Basic Authentication.\n- Adjustable concurrency limits for request handling.\n- Configurable request timeout settings.\n- Support for configuring rate limits, such as 300 requests per 5-minute interval.\n- Redirect handling with security precautions.\n- Filtering and reporting slow URLs based on a threshold.\n- Custom User-Agent header support.\n- Option to append random timestamps to URLs to bypass caching mechanisms.\n- Save downloaded documents for further inspection or use as a static site mirror.\n\n## Installation\n\nYou can install Siteprobe using Cargo:\n\n```sh\ncargo install siteprobe\n```\n\nAlternatively, build from source:\n\n```sh\ngit clone https://github.com/bartTC/siteprobe.git\ncd siteprobe\ncargo build --release\n```\n\n## Usage\n\n```sh\nsiteprobe \u003csitemap_url\u003e [OPTIONS]\n```\n\n### Arguments\n\n- `\u003csitemap_url\u003e` - The URL of the sitemap to be fetched and processed.\n\n### Options\n\n```\nUsage: siteprobe [OPTIONS] \u003cSITEMAP_URL\u003e\n\nArguments:\n  \u003cSITEMAP_URL\u003e  The URL of the sitemap to be fetched and processed.\n\nOptions:\n      --basic-auth \u003cBASIC_AUTH\u003e\n          Basic authentication credentials in the format `username:password`\n  -c, --concurrency-limit \u003cCONCURRENCY_LIMIT\u003e\n          Maximum number of concurrent requests allowed [default: 4]\n  -l, --rate-limit \u003cRATE_LIMIT\u003e\n          The rate limit for all requests in the format 'requests/time[unit]',\n          where unit can be seconds (`s`), minutes (`m`), or hours (`h`). E.g.\n          '-l 300/5m' for 300 requests per 5 minutes, or '-l 100/1h' for 100\n          requests per hour.\n  -o, --output-dir \u003cOUTPUT_DIR\u003e\n          Directory where all downloaded documents will be saved\n  -a, --append-timestamp\n          Append a random timestamp to each URL to bypass caching mechanisms\n  -r, --report-path \u003cREPORT_PATH\u003e\n          File path for storing the generated `report.csv`\n  -j, --report-path-json \u003cREPORT_PATH_JSON\u003e\n          File path for storing the generated `report.json`\n  -t, --request-timeout \u003cREQUEST_TIMEOUT\u003e\n          Default timeout (in seconds) for each request [default: 10]\n      --user-agent \u003cUSER_AGENT\u003e\n          Custom User-Agent header to be used in requests [default: \"Mozilla/5.0\n          (compatible; Siteprobe/0.5.0)\"]\n      --slow-num \u003cSLOW_NUM\u003e\n          Limit the number of slow documents displayed in the report. [default:\n          100]\n  -s, --slow-threshold \u003cSLOW_THRESHOLD\u003e\n          Show slow responses. The value is the threshold (in seconds) for\n          considering a document as 'slow'. E.g. '-s 3' for 3 seconds or '-s\n          0.05' for 50ms.\n  -f, --follow-redirects\n          Controls automatic redirects. When enabled, the client will follow\n          HTTP redirects (up to 10 by default). Note that for security, Basic\n          Authentication credentials are intentionally not forwarded during\n          redirects to prevent unintended credential exposure.\n  -h, --help\n          Print help\n```\n\n### Example Usage\n\n```sh\n# Fetch and analyze a sitemap with default settings\nsiteprobe https://example.com/sitemap.xml\n\n# Save the report to a specific file\nsiteprobe https://example.com/sitemap.xml --report-path ./results/report.csv --output-dir ./example.com\n\n# Set concurrency limit to 10 and timeout to 5 seconds\nsiteprobe https://example.com/sitemap.xml --concurrency-limit 10 --request-timeout 5\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbarttc%2Fsiteprobe","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbarttc%2Fsiteprobe","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbarttc%2Fsiteprobe/lists"}