{"id":13464548,"url":"https://github.com/spatie/crawler","last_synced_at":"2026-04-15T13:01:14.928Z","repository":{"id":1961166,"uuid":"45406338","full_name":"spatie/crawler","owner":"spatie","description":"https://spatie.be/docs/crawler","archived":false,"fork":false,"pushed_at":"2026-03-20T08:54:37.000Z","size":710,"stargazers_count":2802,"open_issues_count":1,"forks_count":368,"subscribers_count":65,"default_branch":"main","last_synced_at":"2026-04-02T02:44:05.996Z","etag":null,"topics":["concurrency","crawler","guzzle","php"],"latest_commit_sha":null,"homepage":"https://freek.dev/308-building-a-crawler-in-php","language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/spatie.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":"docs/support-us.md","governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":"spatie","custom":"https://spatie.be/open-source/support-us"}},"created_at":"2015-11-02T16:22:09.000Z","updated_at":"2026-03-31T04:54:08.000Z","dependencies_parsed_at":"2023-02-13T19:00:47.000Z","dependency_job_id":"e974e89d-42f0-4980-8496-9f53e2de55bc","html_url":"https://github.com/spatie/crawler","commit_stats":{"total_commits":440,"total_committers":76,"mean_commits":"5.7894736842105265","dds":"0.40909090909090906","last_synced_commit":"dbc8070e5c94cd7b511a26ab25e2b14e1c39bade"},"previous_names":[],"tags_count":123,"template":false,"template_full_name":null,"purl":"pkg:github/spatie/crawler","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spatie%2Fcrawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spatie%2Fcrawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spatie%2Fcrawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spatie%2Fcrawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/spatie","download_url":"https://codeload.github.com/spatie/crawler/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spatie%2Fcrawler/sbom","scorecard":{"id":840556,"data":{"date":"2025-08-11","repo":{"name":"github.com/spatie/crawler","commit":"46f6c122b37168378aee06e65493ce2b477d7c26"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3.8,"checks":[{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Code-Review","score":3,"reason":"Found 10/30 approved changesets -- score normalized to 3","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Maintained","score":3,"reason":"4 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 3","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE.md:0","Info: FSF or OSI recognized license: MIT License: LICENSE.md:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'main'","Warn: branch protection not enabled for branch 'v4'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 10 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-23T20:23:28.871Z","repository_id":1961166,"created_at":"2025-08-23T20:23:28.871Z","updated_at":"2025-08-23T20:23:28.871Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31842193,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-15T11:29:19.690Z","status":"ssl_error","status_checked_at":"2026-04-15T11:29:19.171Z","response_time":63,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["concurrency","crawler","guzzle","php"],"created_at":"2024-07-31T14:00:45.925Z","updated_at":"2026-04-15T13:01:14.922Z","avatar_url":"https://github.com/spatie.png","language":"PHP","funding_links":["https://github.com/sponsors/spatie","https://spatie.be/open-source/support-us"],"categories":["All","PHP","Crawlers"],"sub_categories":[],"readme":"\u003cdiv align=\"left\"\u003e\n    \u003ca href=\"https://spatie.be/open-source?utm_source=github\u0026utm_medium=banner\u0026utm_campaign=crawler\"\u003e\n      \u003cpicture\u003e\n        \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"https://spatie.be/packages/header/crawler/html/dark.webp?\"\u003e\n        \u003cimg alt=\"Logo for crawler\" src=\"https://spatie.be/packages/header/crawler/html/light.webp\"\u003e\n      \u003c/picture\u003e\n    \u003c/a\u003e\n\n\u003ch1\u003eCrawl the web using PHP\u003c/h1\u003e\n\n[![Latest Version on Packagist](https://img.shields.io/packagist/v/spatie/crawler.svg?style=flat-square)](https://packagist.org/packages/spatie/crawler)\n[![MIT Licensed](https://img.shields.io/badge/license-MIT-brightgreen.svg?style=flat-square)](LICENSE.md)\n![Tests](https://github.com/spatie/crawler/workflows/Tests/badge.svg)\n[![Total Downloads](https://img.shields.io/packagist/dt/spatie/crawler.svg?style=flat-square)](https://packagist.org/packages/spatie/crawler)\n\n\u003c/div\u003e\n\nThis package provides a powerful, easy to use class to crawl links on a website. Under the hood, Guzzle promises are used to [crawl multiple URLs concurrently](http://docs.guzzlephp.org/en/latest/quickstart.html?highlight=pool#concurrent-requests).\n\nBecause the crawler can execute JavaScript, it can crawl JavaScript rendered sites. Under the hood, [Chrome and Puppeteer](https://github.com/spatie/browsershot) are used to power this feature.\n\nHere's a quick example:\n\n```php\nuse Spatie\\Crawler\\Crawler;\nuse Spatie\\Crawler\\CrawlResponse;\n\nCrawler::create('https://example.com')\n    -\u003eonCrawled(function (string $url, CrawlResponse $response) {\n        echo \"{$url}: {$response-\u003estatus()}\\n\";\n    })\n    -\u003estart();\n```\n\nOr collect all URLs on a site:\n\n```php\n$urls = Crawler::create('https://example.com')\n    -\u003einternalOnly()\n    -\u003edepth(3)\n    -\u003efoundUrls();\n```\n\nYou can also test your crawl logic without making real HTTP requests:\n\n```php\nCrawler::create('https://example.com')\n    -\u003efake([\n        'https://example.com' =\u003e '\u003chtml\u003e\u003ca href=\"/about\"\u003eAbout\u003c/a\u003e\u003c/html\u003e',\n        'https://example.com/about' =\u003e '\u003chtml\u003eAbout page\u003c/html\u003e',\n    ])\n    -\u003efoundUrls();\n```\n\nIf you need to stop a crawl based on external state, you can register a callback that receives the current crawler instance and is checked before scheduling each next request:\n\n```php\nuse Spatie\\Crawler\\Crawler;\n\n$shouldStop = false;\n\nCrawler::create('https://example.com')\n    -\u003eshouldStopCallback(function (Crawler $crawler) use (\u0026$shouldStop) {\n        return $shouldStop;\n    })\n    -\u003eonCrawled(function (string $url) use (\u0026$shouldStop) {\n        $shouldStop = true;\n    })\n    -\u003estart();\n```\n\n## Support us\n\n[\u003cimg src=\"https://github-ads.s3.eu-central-1.amazonaws.com/crawler.jpg?t=1\" width=\"419px\" /\u003e](https://spatie.be/github-ad-click/crawler)\n\nWe invest a lot of resources into creating [best in class open source packages](https://spatie.be/open-source). You can support us by [buying one of our paid products](https://spatie.be/open-source/support-us).\n\nWe highly appreciate you sending us a postcard from your hometown, mentioning which of our package(s) you are using. You'll find our address on [our contact page](https://spatie.be/about-us). We publish all received postcards on [our virtual postcard wall](https://spatie.be/open-source/postcards).\n\n## Documentation\n\nAll documentation is available [on our documentation site](https://spatie.be/docs/crawler).\n\n## Testing\n\n```bash\ncomposer test\n```\n\n## Changelog\n\nPlease see [CHANGELOG](CHANGELOG.md) for more information on what has changed recently.\n\n## Contributing\n\nPlease see [CONTRIBUTING](https://github.com/spatie/.github/blob/main/CONTRIBUTING.md) for details.\n\n## Security Vulnerabilities\n\nPlease review [our security policy](../../security/policy) on how to report security vulnerabilities.\n\n## Credits\n\n- [Freek Van der Herten](https://github.com/freekmurze)\n- [All Contributors](../../contributors)\n\n## License\n\nThe MIT License (MIT). Please see [License File](LICENSE.md) for more information.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspatie%2Fcrawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fspatie%2Fcrawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspatie%2Fcrawler/lists"}