{"id":18613628,"url":"https://github.com/dylancl/sitemap-crawler","last_synced_at":"2025-10-04T08:39:05.619Z","repository":{"id":241263603,"uuid":"804725962","full_name":"dylancl/sitemap-crawler","owner":"dylancl","description":"Verify the status of each url in a (hosted) sitemap XML file.","archived":false,"fork":false,"pushed_at":"2024-05-26T20:06:18.000Z","size":36,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-12-27T02:25:14.546Z","etag":null,"topics":["crawler","parser","scraper","sitemap","xml"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dylancl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-23T06:35:32.000Z","updated_at":"2024-05-26T20:06:21.000Z","dependencies_parsed_at":"2024-05-23T07:40:58.818Z","dependency_job_id":"83540e9d-222c-4e05-bfe5-114af6a56518","html_url":"https://github.com/dylancl/sitemap-crawler","commit_stats":null,"previous_names":["dylancl/sitemap-scraper","dylancl/sitemap-crawler"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dylancl%2Fsitemap-crawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dylancl%2Fsitemap-crawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dylancl%2Fsitemap-crawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dylancl%2Fsitemap-crawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dylancl","download_url":"https://codeload.github.com/dylancl/sitemap-crawler/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239406442,"owners_count":19633024,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","parser","scraper","sitemap","xml"],"created_at":"2024-11-07T03:23:05.669Z","updated_at":"2025-10-04T08:39:05.523Z","avatar_url":"https://github.com/dylancl.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# sitemap-crawler\n\nVerify the status of each url in a (hosted) sitemap XML file, by crawling through the XML and fetching it to see if returns a 200 OK. Free alternative to Screaming Frog SEO Spider's paid sitemap crawler feature.\n\nhttps://github.com/dylancl/sitemap-scraper/assets/14956708/d15b02a0-351a-43fd-a91e-90c042603075\n\n\n# Installation\n\n1. Clone the repository\n\n   ```bash\n   git clone https://github.com/dylancl/sitemap-scraper.git\n   ```\n\n2. Install the dependencies\n\n   ```bash\n   pnpm install\n   ```\n\n3. Run the script\n\n   ```bash\n   pnpm start\n   ```\n\n# Usage\n\n1. Enter the URL of the sitemap XML file you want to check.\n2. The script will ask you for configuration options:\n   - **Concurrency limit**: The maximum number of requests that can be made at the same time. Default is 5. Must be a number between 1 and 15.\n   - **Request delay**: The delay between each request. Default is 1000. Must be a number starting from 250.\n   - **Traversal order**: The order in which the URLs will be checked. Default is `sequential`. Options are `sequential` and `random`.\n3. The script will start checking the URLs and display the progress in the console.\n4. When the script is done, it will ask you if you want to save the results (ok \u0026 not ok URLs) to a file.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdylancl%2Fsitemap-crawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdylancl%2Fsitemap-crawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdylancl%2Fsitemap-crawler/lists"}