Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/spekulatius/spatie-crawler-toolkit-for-laravel
A toolkit for Spatie's Crawler and Laravel.
https://github.com/spekulatius/spatie-crawler-toolkit-for-laravel
crawler laravel laravel-crawler php-crawler php-scraper spatie-crawler
Last synced: about 2 months ago
JSON representation
A toolkit for Spatie's Crawler and Laravel.
- Host: GitHub
- URL: https://github.com/spekulatius/spatie-crawler-toolkit-for-laravel
- Owner: spekulatius
- License: mit
- Created: 2020-11-17T10:45:34.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2023-05-28T10:43:35.000Z (over 1 year ago)
- Last Synced: 2024-09-10T11:51:25.007Z (4 months ago)
- Topics: crawler, laravel, laravel-crawler, php-crawler, php-scraper, spatie-crawler
- Language: PHP
- Homepage: https://releasecandidate.dev
- Size: 23.4 KB
- Stars: 18
- Watchers: 4
- Forks: 8
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE.md
Awesome Lists containing this project
README
# Spatie Crawler Toolkit for Laravel
## Laravel 9 should work, but is not extensively tested. Please report any issues you might find!
[![Software License](https://img.shields.io/badge/license-MIT-brightgreen.svg?style=flat-square)](LICENSE.md)
[![Total Downloads](https://img.shields.io/packagist/dt/spekulatius/spatie-crawler-toolkit-for-laravel.svg?style=flat-square)](https://packagist.org/packages/spekulatius/spatie-crawler-toolkit-for-laravel) [![Awesome PHP crawler](https://awesome.re/badge.svg)](https://github.com/spekulatius/awesome-php-scrapers-and-crawlers)A set of classes to use [Spatie's crawler](https://github.com/spatie/crawler) with Laravel. Aim is to simplify building crawler applications or adding a crawler to an existing Laravel project. It can be conveniently integrated into [PHP Scraper](https://github.com/spekulatius/phpscraper), for example. At the moment the following helper classes are implemented:
## Cache Crawl Queue
The [CacheCrawlQueue](https://github.com/spekulatius/spatie-crawler-toolkit-for-laravel/blob/master/src/Queues/CacheCrawlQueue.php) allows use the pre-configured Cache in Laravel to store the queue. It stores any actions performed on the queue directly to avoid the need to manually store the queue. You can add it directly to your crawler:
```php
Crawler::create()
->setCrawlQueue(new \Spekulatius\SpatieCrawlerToolkit\Queues\CacheCrawlQueue($url))
->startCrawling($url);
```With this you can stop the crawl and restart at any time. This requires a cache-driver being configured in your `.env` file.
## Crawl Logger
The [Crawl Logger](https://github.com/spekulatius/spatie-crawler-toolkit-for-laravel/blob/master/src/Observers/CrawlLogger.php) is an observer you can add to your crawler to enable logging of crawl events:
```php
Crawler::create()
->setCrawlObserver(new \Spekulatius\SpatieCrawlerToolkit\Observers\CrawlLogger)
->startCrawling($url);
```You can export the configuration (see below) to tweak which events are logged.
## Crawl Events
The toolkit contains an observer to send you Laravel events allowing you to react to crawl events. This covers the following events:
- [WillCrawl](https://github.com/spekulatius/spatie-crawler-toolkit-for-laravel/blob/master/src/Events/WillCrawl.php)
- [Crawled](https://github.com/spekulatius/spatie-crawler-toolkit-for-laravel/blob/master/src/Events/Crawled.php)
- [CrawlFailed](https://github.com/spekulatius/spatie-crawler-toolkit-for-laravel/blob/master/src/Events/CrawlFailed.php)
- [FinishedCrawling](https://github.com/spekulatius/spatie-crawler-toolkit-for-laravel/blob/master/src/Events/FinishedCrawling.php)By default, no events are emitted. To enable events, you will need to add the event observer to your crawler:
```php
$eventObserver = new \Spekulatius\SpatieCrawlerToolkit\Observers\CrawlEvents;Crawler::create()
->setCrawlObserver($eventObserver)
->startCrawling($url);
```An optional identifier can be passed to the crawl events to distinguish between different crawls:
```php
$eventObserver = new \Spekulatius\SpatieCrawlerToolkit\Observers\CrawlEvents('my-crawl');
```## Planned functionality
- Batched crawling using Laravel Queues.
For any suggestions on how to enhance this, please raise an issue.
## Requirements & Install
### Requirements
- Laravel 6, 7, 8, 9. Laravel 9 is still in testing. Please report any issues.
- Cache and Log configured in Laravel.### Installation
```bash
composer require spekulatius/spatie-crawler-toolkit-for-laravel
```Optionally, you can publish the configuration file:
```bash
php artisan vendor:publish --tag=crawler-toolkit-config
```## Contributing
Please raise a PR or issue.
## License
Released under the MIT license. Please see [License File](LICENSE.md) for more information.