https://github.com/68publishers/crawler-client-php
:spider_web: PHP Client for https://github.com/68publishers/crawler
https://github.com/68publishers/crawler-client-php
crawler crawling php scraper scraping
Last synced: 5 months ago
JSON representation
:spider_web: PHP Client for https://github.com/68publishers/crawler
- Host: GitHub
- URL: https://github.com/68publishers/crawler-client-php
- Owner: 68publishers
- License: mit
- Created: 2023-06-07T04:38:34.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2024-10-14T04:49:55.000Z (over 1 year ago)
- Last Synced: 2025-08-27T17:10:55.292Z (10 months ago)
- Topics: crawler, crawling, php, scraper, scraping
- Language: PHP
- Homepage:
- Size: 88.9 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.md
Awesome Lists containing this project
README
## Installation
```sh
$ composer require 68publishers/crawler-client-php
```
## Client initialization
The client instance is simply created by calling the static method `create()`.
```php
use SixtyEightPublishers\CrawlerClient\CrawlerClient;
$client = CrawlerClient::create('');
```
The [Guzzle](https://github.com/guzzle/guzzle) library is used to communicate with the Crawler API.
If you want to pass some custom options to the configuration for Guzzle, use the second optional parameter.
```php
use SixtyEightPublishers\CrawlerClient\CrawlerClient;
$client = CrawlerClient::create('', [
'timeout' => 0,
]);
```
Requests to the Crawler API must always be authenticated, so we must provide credentials.
```php
use SixtyEightPublishers\CrawlerClient\CrawlerClient;
use SixtyEightPublishers\CrawlerClient\Authentication\Credentials;
$client = CrawlerClient::create('');
$client = $client->withAuthentication(new Credentials('', ''));
```
It should be pointed out that the client is immutable - calling the `with*` methods always returns a new instance.
This is all that is needed for the client to work properly. You can read about other options on the [Advanced options](docs/advanced-options.md) page.
## Nette Framework integration
For integration with the Nette Framework please follow [this link](docs/integration-with-nette.md).
## Working with scenarios
Scenarios are handled by `ScenarioController`.
```php
use SixtyEightPublishers\CrawlerClient\Controller\Scenario\ScenariosController;
$controller = $client->getController(ScenariosController::class);
```
### List scenarios
```php
/**
* @param int $page
* @param int $limit
* @param array> $filter
*
* @returns \SixtyEightPublishers\CrawlerClient\Controller\Scenario\ScenarioListingResponse
*
* @throws \SixtyEightPublishers\CrawlerClient\Exception\BadRequestException
*/
```
```php
$response = $controller->listScenarios(1, 10);
$filteredResponse = $controller->listScenarios(1, 10, [
'name' => 'Test',
'status' => 'failed',
])
```
### Get scenario
```php
/**
* @param string $scenarioId
*
* @returns \SixtyEightPublishers\CrawlerClient\Controller\Scenario\ScenarioResponse
*
* @throws \SixtyEightPublishers\CrawlerClient\Exception\BadRequestException
* @throws \SixtyEightPublishers\CrawlerClient\Exception\NotFoundException
*/
```
```php
$response = $controller->getScenario('');
```
### Run scenario
```php
/**
* @param \SixtyEightPublishers\CrawlerClient\Controller\Scenario\RequestBody\ScenarioRequestBody $requestBody
*
* @returns \SixtyEightPublishers\CrawlerClient\Controller\Scenario\ScenarioResponse
*
* @throws \SixtyEightPublishers\CrawlerClient\Exception\BadRequestException
*/
```
As a scenario config we can pass a normal array or use prepared value objects. Both options are valid.
```php
use SixtyEightPublishers\CrawlerClient\Controller\Scenario\RequestBody\ScenarioRequestBody;
$requestBody = new ScenarioRequestBody(
name: 'My scenario',
flags: ['my_flag' => 'my_flag_value'],
config: [
'scenes' => [ /* ... */ ],
'options' => [ /* ... */ ],
'entrypoint' => [ /* ... */ ],
],
)
$response = $controller->runScenario($requestBody);
```
```php
use SixtyEightPublishers\CrawlerClient\Controller\Scenario\RequestBody\ScenarioRequestBody;
use SixtyEightPublishers\CrawlerClient\Controller\Scenario\ValueObject\ScenarioConfig;
use SixtyEightPublishers\CrawlerClient\Controller\Scenario\ValueObject\Entrypoint;
use SixtyEightPublishers\CrawlerClient\Controller\Scenario\ValueObject\Action;
$requestBody = new ScenarioRequestBody(
name: 'My scenario',
flags: ['my_flag' => 'my_flag_value'],
config: (new ScenarioConfig(new Entrypoint('', 'default')))
->withOptions(/* ... */)
->withScene('default', [
new Action('...', [ /* ... */ ])
new Action('...', [ /* ... */ ])
]),
)
$response = $controller->runScenario($requestBody);
```
### Validate scenario
```php
/**
* @param \SixtyEightPublishers\CrawlerClient\Controller\Scenario\RequestBody\ScenarioRequestBody $requestBody
*
* @returns \SixtyEightPublishers\CrawlerClient\Controller\Scenario\ValidateScenarioResponse
*/
```
As a scenario config we can pass a normal array or use prepared value objects. Both options are valid.
```php
use SixtyEightPublishers\CrawlerClient\Controller\Scenario\RequestBody\ScenarioRequestBody;
$requestBody = new ScenarioRequestBody(
name: 'My scenario',
flags: ['my_flag' => 'my_flag_value'],
config: [
'scenes' => [ /* ... */ ],
'options' => [ /* ... */ ],
'entrypoint' => [ /* ... */ ],
],
)
$response = $controller->validateScenario($requestBody);
```
```php
use SixtyEightPublishers\CrawlerClient\Controller\Scenario\RequestBody\ScenarioRequestBody;
use SixtyEightPublishers\CrawlerClient\Controller\Scenario\ValueObject\ScenarioConfig;
use SixtyEightPublishers\CrawlerClient\Controller\Scenario\ValueObject\Entrypoint;
use SixtyEightPublishers\CrawlerClient\Controller\Scenario\ValueObject\Action;
$requestBody = new ScenarioRequestBody(
name: 'My scenario',
flags: ['my_flag' => 'my_flag_value'],
config: (new ScenarioConfig(new Entrypoint('', 'default')))
->withOptions(/* ... */)
->withScene('default', [
new Action('...', [ /* ... */ ])
new Action('...', [ /* ... */ ])
]),
)
$response = $controller->validateScenario($requestBody);
```
### Abort scenario
```php
/**
* @param string $scenarioId
*
* @returns \SixtyEightPublishers\CrawlerClient\Controller\Common\NoContentResponse
*
* @throws \SixtyEightPublishers\CrawlerClient\Exception\BadRequestException
* @throws \SixtyEightPublishers\CrawlerClient\Exception\NotFoundException
*/
```
```php
$response = $controller->abortScenario('');
```
## Working with scenario schedulers
Scenario schedulers are handled by `ScenarioSchedulersController`.
```php
use SixtyEightPublishers\CrawlerClient\Controller\ScenarioScheduler\ScenarioSchedulersController;
$controller = $client->getController(ScenarioSchedulersController::class);
```
### List scenario schedulers
```php
/**
* @param int $page
* @param int $limit
* @param array> $filter
*
* @returns \SixtyEightPublishers\CrawlerClient\Controller\ScenarioScheduler\ScenarioSchedulerListingResponse
*
* @throws \SixtyEightPublishers\CrawlerClient\Exception\BadRequestException
*/
```
```php
$response = $controller->listScenarioSchedulers(1, 10);
$filteredResponse = $controller->listScenarioSchedulers(1, 10, [
'name' => 'Test',
'userId' => '',
])
```
### Get scenario scheduler
```php
/**
* @param string $scenarioSchedulerId
*
* @returns \SixtyEightPublishers\CrawlerClient\Controller\ScenarioScheduler\ScenarioSchedulerResponse
*
* @throws \SixtyEightPublishers\CrawlerClient\Exception\BadRequestException
* @throws \SixtyEightPublishers\CrawlerClient\Exception\NotFoundException
*/
```
```php
$response = $controller->getScenarioScheduler('');
$etag = $response->getEtag(); # you need Etag for update
```
### Create scenario scheduler
```php
/**
* @param \SixtyEightPublishers\CrawlerClient\Controller\ScenarioScheduler\RequestBody\ScenarioSchedulerRequestBody $requestBody
*
* @returns \SixtyEightPublishers\CrawlerClient\Controller\ScenarioScheduler\ScenarioSchedulerResponse
*
* @throws \SixtyEightPublishers\CrawlerClient\Exception\BadRequestException
*/
```
As a scenario config we can pass a normal array or use prepared value objects. Both options are valid.
```php
use SixtyEightPublishers\CrawlerClient\Controller\ScenarioScheduler\RequestBody\ScenarioSchedulerRequestBody;
$requestBody = new ScenarioSchedulerRequestBody(
name: 'My scenario',
flags: ['my_flag' => 'my_flag_value'],
active: true,
expression: '0 2 * * *',
config: [
'scenes' => [ /* ... */ ],
'options' => [ /* ... */ ],
'entrypoint' => [ /* ... */ ],
],
)
$response = $controller->createScenarioScheduler($requestBody);
$etag = $response->getEtag(); # you need Etag for update
```
```php
use SixtyEightPublishers\CrawlerClient\Controller\ScenarioScheduler\RequestBody\ScenarioSchedulerRequestBody;
use SixtyEightPublishers\CrawlerClient\Controller\Scenario\ValueObject\ScenarioConfig;
use SixtyEightPublishers\CrawlerClient\Controller\Scenario\ValueObject\Entrypoint;
use SixtyEightPublishers\CrawlerClient\Controller\Scenario\ValueObject\Action;
$requestBody = new ScenarioSchedulerRequestBody(
name: 'My scenario',
flags: ['my_flag' => 'my_flag_value'],
active: true,
expression: '0 2 * * *',
config: (new ScenarioConfig(new Entrypoint('', 'default')))
->withOptions(/* ... */)
->withScene('default', [
new Action('...', [ /* ... */ ])
new Action('...', [ /* ... */ ])
]),
)
$response = $controller->runScenario($requestBody);
$etag = $response->getEtag(); # you need Etag for update
```
### Update scenario scheduler
```php
/**
* @param string $scenarioSchedulerId
* @param string $etag
* @param \SixtyEightPublishers\CrawlerClient\Controller\ScenarioScheduler\RequestBody\ScenarioSchedulerRequestBody $requestBody
*
* @returns \SixtyEightPublishers\CrawlerClient\Controller\ScenarioScheduler\ScenarioSchedulerResponse
*
* @throws \SixtyEightPublishers\CrawlerClient\Exception\BadRequestException
* @throws \SixtyEightPublishers\CrawlerClient\Exception\PreconditionFailedException
*/
```
As a scenario config we can pass a normal array or use prepared value objects. Both options are valid.
```php
use SixtyEightPublishers\CrawlerClient\Controller\ScenarioScheduler\RequestBody\ScenarioSchedulerRequestBody;
$requestBody = new ScenarioSchedulerRequestBody(
name: 'My scenario',
flags: ['my_flag' => 'my_flag_value'],
active: true,
expression: '0 2 * * *',
config: [
'scenes' => [ /* ... */ ],
'options' => [ /* ... */ ],
'entrypoint' => [ /* ... */ ],
],
)
$response = $controller->updateScenarioScheduler('', '', $requestBody);
$etag = $response->getEtag(); # you need Etag for next update
```
```php
use SixtyEightPublishers\CrawlerClient\Controller\ScenarioScheduler\RequestBody\ScenarioSchedulerRequestBody;
use SixtyEightPublishers\CrawlerClient\Controller\Scenario\ValueObject\ScenarioConfig;
use SixtyEightPublishers\CrawlerClient\Controller\Scenario\ValueObject\Entrypoint;
use SixtyEightPublishers\CrawlerClient\Controller\Scenario\ValueObject\Action;
$requestBody = new ScenarioSchedulerRequestBody(
name: 'My scenario',
flags: ['my_flag' => 'my_flag_value'],
active: true,
expression: '0 2 * * *',
config: (new ScenarioConfig(new Entrypoint('', 'default')))
->withOptions(/* ... */)
->withScene('default', [
new Action('...', [ /* ... */ ])
new Action('...', [ /* ... */ ])
]),
)
$response = $controller->updateScenarioScheduler('', '', $requestBody);
$etag = $response->getEtag(); # you need Etag for next update
```
### Validate scenario scheduler
```php
/**
* @param \SixtyEightPublishers\CrawlerClient\Controller\ScenarioScheduler\RequestBody\ScenarioSchedulerRequestBody $requestBody
*
* @returns \SixtyEightPublishers\CrawlerClient\Controller\ScenarioScheduler\ValidateScenarioSchedulerResponse
*/
```
As a scenario config we can pass a normal array or use prepared value objects. Both options are valid.
```php
use SixtyEightPublishers\CrawlerClient\Controller\ScenarioScheduler\RequestBody\ScenarioSchedulerRequestBody;
$requestBody = new ScenarioSchedulerRequestBody(
name: 'My scenario',
flags: ['my_flag' => 'my_flag_value'],
active: true,
expression: '0 2 * * *',
config: [
'scenes' => [ /* ... */ ],
'options' => [ /* ... */ ],
'entrypoint' => [ /* ... */ ],
],
)
$response = $controller->validateScenarioScheduler($requestBody);
```
```php
use SixtyEightPublishers\CrawlerClient\Controller\ScenarioScheduler\RequestBody\ScenarioSchedulerRequestBody;
use SixtyEightPublishers\CrawlerClient\Controller\Scenario\ValueObject\ScenarioConfig;
use SixtyEightPublishers\CrawlerClient\Controller\Scenario\ValueObject\Entrypoint;
use SixtyEightPublishers\CrawlerClient\Controller\Scenario\ValueObject\Action;
$requestBody = new ScenarioSchedulerRequestBody(
name: 'My scenario',
flags: ['my_flag' => 'my_flag_value'],
active: true,
expression: '0 2 * * *',
config: (new ScenarioConfig(new Entrypoint('', 'default')))
->withOptions(/* ... */)
->withScene('default', [
new Action('...', [ /* ... */ ])
new Action('...', [ /* ... */ ])
]),
)
$response = $controller->validateScenarioScheduler($requestBody);
```
### Activate/deactivate scenario scheduler
```php
/**
* @param string $scenarioSchedulerId
*
* @returns \SixtyEightPublishers\CrawlerClient\Controller\ScenarioScheduler\ScenarioSchedulerResponse
*
* @throws \SixtyEightPublishers\CrawlerClient\Exception\BadRequestException
* @throws \SixtyEightPublishers\CrawlerClient\Exception\NotFoundException
*/
```
```php
use SixtyEightPublishers\CrawlerClient\Controller\ScenarioScheduler\RequestBody\ScenarioSchedulerRequestBody;
# to activate the scenario scheduler:
$response = $controller->activateScenarioScheduler('');
# to deactivate the scenario scheduler:
$response = $controller->deactivateScenarioScheduler('');
```
### Delete scenario scheduler
```php
/**
* @param string $scenarioSchedulerId
*
* @returns \SixtyEightPublishers\CrawlerClient\Controller\Common\NoContentResponse
*
* @throws \SixtyEightPublishers\CrawlerClient\Exception\BadRequestException
* @throws \SixtyEightPublishers\CrawlerClient\Exception\NotFoundException
*/
```
```php
$response = $controller->deleteScenarioScheduler('');
```
## License
The package is distributed under the MIT License. See [LICENSE](LICENSE.md) for more information.