https://github.com/spatie/crawler

https://spatie.be/docs/crawler
https://github.com/spatie/crawler

concurrency crawler guzzle php

Last synced: 3 months ago
JSON representation

https://spatie.be/docs/crawler

Host: GitHub
URL: https://github.com/spatie/crawler
Owner: spatie
License: mit
Created: 2015-11-02T16:22:09.000Z (over 10 years ago)
Default Branch: main
Last Pushed: 2026-03-20T08:54:37.000Z (4 months ago)
Last Synced: 2026-04-02T02:44:05.996Z (3 months ago)
Topics: concurrency, crawler, guzzle, php
Language: PHP
Homepage: https://freek.dev/308-building-a-crawler-in-php
Size: 693 KB
Stars: 2,802
Watchers: 65
Forks: 368
Open Issues: 1
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Funding: .github/FUNDING.yml
- License: LICENSE.md
- Support: docs/support-us.md

Awesome Lists containing this project

awesome-crawler - spatie/crawler - An easy to use, powerful crawler implemented in PHP. Can execute Javascript. (PHP)

README

          


    

      

        

        

      

    

Crawl the web using PHP


[![Latest Version on Packagist](https://img.shields.io/packagist/v/spatie/crawler.svg?style=flat-square)](https://packagist.org/packages/spatie/crawler)

[![MIT Licensed](https://img.shields.io/badge/license-MIT-brightgreen.svg?style=flat-square)](LICENSE.md)

![Tests](https://github.com/spatie/crawler/workflows/Tests/badge.svg)

[![Total Downloads](https://img.shields.io/packagist/dt/spatie/crawler.svg?style=flat-square)](https://packagist.org/packages/spatie/crawler)



This package provides a powerful, easy to use class to crawl links on a website. Under the hood, Guzzle promises are used to [crawl multiple URLs concurrently](http://docs.guzzlephp.org/en/latest/quickstart.html?highlight=pool#concurrent-requests).

Because the crawler can execute JavaScript, it can crawl JavaScript rendered sites. Under the hood, [Chrome and Puppeteer](https://github.com/spatie/browsershot) are used to power this feature.

Here's a quick example:

```php

use Spatie\Crawler\Crawler;

use Spatie\Crawler\CrawlResponse;

Crawler::create('https://example.com')

    ->onCrawled(function (string $url, CrawlResponse $response) {

        echo "{$url}: {$response->status()}\n";

    })

    ->start();

```

Or collect all URLs on a site:

```php

$urls = Crawler::create('https://example.com')

    ->internalOnly()

    ->depth(3)

    ->foundUrls();

```

You can also test your crawl logic without making real HTTP requests:

```php

Crawler::create('https://example.com')

    ->fake([

        'https://example.com' => 'About',

        'https://example.com/about' => 'About page',

    ])

    ->foundUrls();

```

If you need to stop a crawl based on external state, you can register a callback that receives the current crawler instance and is checked before scheduling each next request:

```php

use Spatie\Crawler\Crawler;

$shouldStop = false;

Crawler::create('https://example.com')

    ->shouldStopCallback(function (Crawler $crawler) use (&$shouldStop) {

        return $shouldStop;

    })

    ->onCrawled(function (string $url) use (&$shouldStop) {

        $shouldStop = true;

    })

    ->start();

```

## Support us

[](https://spatie.be/github-ad-click/crawler)

We invest a lot of resources into creating [best in class open source packages](https://spatie.be/open-source). You can support us by [buying one of our paid products](https://spatie.be/open-source/support-us).

We highly appreciate you sending us a postcard from your hometown, mentioning which of our package(s) you are using. You'll find our address on [our contact page](https://spatie.be/about-us). We publish all received postcards on [our virtual postcard wall](https://spatie.be/open-source/postcards).

## Documentation

All documentation is available [on our documentation site](https://spatie.be/docs/crawler).

## Testing

```bash

composer test

```

## Changelog

Please see [CHANGELOG](CHANGELOG.md) for more information on what has changed recently.

## Contributing

Please see [CONTRIBUTING](https://github.com/spatie/.github/blob/main/CONTRIBUTING.md) for details.

## Security Vulnerabilities

Please review [our security policy](../../security/policy) on how to report security vulnerabilities.

## Credits

- [Freek Van der Herten](https://github.com/freekmurze)

- [All Contributors](../../contributors)

## License

The MIT License (MIT). Please see [License File](LICENSE.md) for more information.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/spatie/crawler

Awesome Lists containing this project

README

Crawl the web using PHP