{"id":13464658,"url":"https://github.com/TurnerSoftware/InfinityCrawler","last_synced_at":"2025-03-25T11:31:58.655Z","repository":{"id":33934371,"uuid":"163394468","full_name":"TurnerSoftware/InfinityCrawler","owner":"TurnerSoftware","description":"A simple but powerful web crawler library for .NET","archived":false,"fork":false,"pushed_at":"2023-12-15T05:08:18.000Z","size":334,"stargazers_count":249,"open_issues_count":13,"forks_count":35,"subscribers_count":10,"default_branch":"main","last_synced_at":"2025-03-22T08:03:08.402Z","etag":null,"topics":["crawler","robots-txt","spider","web-crawler","web-crawling"],"latest_commit_sha":null,"homepage":"","language":"C#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TurnerSoftware.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"License.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null},"funding":{"github":"Turnerj"}},"created_at":"2018-12-28T09:49:57.000Z","updated_at":"2025-03-13T02:52:21.000Z","dependencies_parsed_at":"2023-12-15T06:26:04.396Z","dependency_job_id":"0bf33d94-19c9-48be-842a-21008067f474","html_url":"https://github.com/TurnerSoftware/InfinityCrawler","commit_stats":null,"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TurnerSoftware%2FInfinityCrawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TurnerSoftware%2FInfinityCrawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TurnerSoftware%2FInfinityCrawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TurnerSoftware%2FInfinityCrawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TurnerSoftware","download_url":"https://codeload.github.com/TurnerSoftware/InfinityCrawler/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245454075,"owners_count":20617972,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","robots-txt","spider","web-crawler","web-crawling"],"created_at":"2024-07-31T14:00:48.167Z","updated_at":"2025-03-25T11:31:58.628Z","avatar_url":"https://github.com/TurnerSoftware.png","language":"C#","readme":"\u003cdiv align=\"center\"\u003e\n\n![Icon](images/icon.png)\n# Infinity Crawler\nA simple but powerful web crawler library for .NET \n\n![Build](https://img.shields.io/github/workflow/status/TurnerSoftware/infinitycrawler/Build)\n[![Codecov](https://img.shields.io/codecov/c/github/turnersoftware/infinitycrawler/main.svg)](https://codecov.io/gh/TurnerSoftware/infinitycrawler)\n[![NuGet](https://img.shields.io/nuget/v/InfinityCrawler.svg)](https://www.nuget.org/packages/InfinityCrawler)\n\u003c/div\u003e\n\n## Features\n- Obeys robots.txt (crawl delay \u0026 allow/disallow)\n- Obeys in-page robots rules (`X-Robots-Tag` header and `\u003cmeta name=\"robots\" /\u003e` tag)\n- Uses sitemap.xml to seed the initial crawl of the site\n- Built around a parallel task `async`/`await` system\n- Swappable request and content processors, allowing greater customisation\n- Auto-throttling (see below)\n\n## Licensing and Support\n\nInfinity Crawler is licensed under the MIT license. It is free to use in personal and commercial projects.\n\nThere are [support plans](https://turnersoftware.com.au/support-plans) available that cover all active [Turner Software OSS projects](https://github.com/TurnerSoftware).\nSupport plans provide private email support, expert usage advice for our projects, priority bug fixes and more.\nThese support plans help fund our OSS commitments to provide better software for everyone.\n\n## Polite Crawling\nThe crawler is built around fast but \"polite\" crawling of website.\nThis is accomplished through a number of settings that allow adjustments of delays and throttles.\n\nYou can control:\n- Number of simulatenous requests\n- The delay between requests starting (Note: If a `crawl-delay` is defined for the User-agent, that will be the minimum)\n- Artificial \"jitter\" in request delays (requests seem less \"robotic\")\n- Timeout for a request before throttling will apply for new requests\n- Throttling request backoff: The amount of time added to the delay to throttle requests (this is cumulative)\n- Minimum number of requests under the throttle timeout before the throttle is gradually removed\n\n## Other Settings\n- Control the UserAgent used in the crawling process\n- Set additional host aliases you want the crawling process to follow (for example, subdomains)\n- The max number of retries for a specific URI\n- The max number of redirects to follow\n- The max number of pages to crawl\n\n## Example Usage\n```csharp\nusing InfinityCrawler;\n\nvar crawler = new Crawler();\nvar result = await crawler.Crawl(new Uri(\"http://example.org/\"), new CrawlSettings {\n\tUserAgent = \"MyVeryOwnWebCrawler/1.0\",\n\tRequestProcessorOptions = new RequestProcessorOptions\n\t{\n\t\tMaxNumberOfSimultaneousRequests = 5\n\t}\n});\n```","funding_links":["https://github.com/sponsors/Turnerj"],"categories":["All","others","Libraries, Frameworks and Tools","Misc","C#"],"sub_categories":["Misc"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTurnerSoftware%2FInfinityCrawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FTurnerSoftware%2FInfinityCrawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTurnerSoftware%2FInfinityCrawler/lists"}