https://github.com/spatie/crawler
https://spatie.be/docs/crawler
https://github.com/spatie/crawler
concurrency crawler guzzle php
Last synced: 5 days ago
JSON representation
https://spatie.be/docs/crawler
- Host: GitHub
- URL: https://github.com/spatie/crawler
- Owner: spatie
- License: mit
- Created: 2015-11-02T16:22:09.000Z (over 10 years ago)
- Default Branch: main
- Last Pushed: 2026-03-20T08:54:37.000Z (17 days ago)
- Last Synced: 2026-03-20T12:33:15.816Z (17 days ago)
- Topics: concurrency, crawler, guzzle, php
- Language: PHP
- Homepage: https://freek.dev/308-building-a-crawler-in-php
- Size: 693 KB
- Stars: 2,786
- Watchers: 65
- Forks: 368
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Funding: .github/FUNDING.yml
- License: LICENSE.md
- Support: docs/support-us.md
Awesome Lists containing this project
- awesome-crawler - spatie/crawler - An easy to use, powerful crawler implemented in PHP. Can execute Javascript. (PHP)
README
Crawl the web using PHP
[](https://packagist.org/packages/spatie/crawler)
[](LICENSE.md)

[](https://packagist.org/packages/spatie/crawler)
This package provides a powerful, easy to use class to crawl links on a website. Under the hood, Guzzle promises are used to [crawl multiple URLs concurrently](http://docs.guzzlephp.org/en/latest/quickstart.html?highlight=pool#concurrent-requests).
Because the crawler can execute JavaScript, it can crawl JavaScript rendered sites. Under the hood, [Chrome and Puppeteer](https://github.com/spatie/browsershot) are used to power this feature.
Here's a quick example:
```php
use Spatie\Crawler\Crawler;
use Spatie\Crawler\CrawlResponse;
Crawler::create('https://example.com')
->onCrawled(function (string $url, CrawlResponse $response) {
echo "{$url}: {$response->status()}\n";
})
->start();
```
Or collect all URLs on a site:
```php
$urls = Crawler::create('https://example.com')
->internalOnly()
->depth(3)
->foundUrls();
```
You can also test your crawl logic without making real HTTP requests:
```php
Crawler::create('https://example.com')
->fake([
'https://example.com' => 'About',
'https://example.com/about' => 'About page',
])
->foundUrls();
```
## Support us
[
](https://spatie.be/github-ad-click/crawler)
We invest a lot of resources into creating [best in class open source packages](https://spatie.be/open-source). You can support us by [buying one of our paid products](https://spatie.be/open-source/support-us).
We highly appreciate you sending us a postcard from your hometown, mentioning which of our package(s) you are using. You'll find our address on [our contact page](https://spatie.be/about-us). We publish all received postcards on [our virtual postcard wall](https://spatie.be/open-source/postcards).
## Documentation
All documentation is available [on our documentation site](https://spatie.be/docs/crawler).
## Testing
```bash
composer test
```
## Changelog
Please see [CHANGELOG](CHANGELOG.md) for more information on what has changed recently.
## Contributing
Please see [CONTRIBUTING](https://github.com/spatie/.github/blob/main/CONTRIBUTING.md) for details.
## Security Vulnerabilities
Please review [our security policy](../../security/policy) on how to report security vulnerabilities.
## Credits
- [Freek Van der Herten](https://github.com/freekmurze)
- [All Contributors](../../contributors)
## License
The MIT License (MIT). Please see [License File](LICENSE.md) for more information.