https://github.com/exfriend/cloudfactory
Cloud Factory Revisited
https://github.com/exfriend/cloudfactory
Last synced: 8 months ago
JSON representation
Cloud Factory Revisited
- Host: GitHub
- URL: https://github.com/exfriend/cloudfactory
- Owner: exfriend
- Created: 2014-10-29T06:23:48.000Z (about 11 years ago)
- Default Branch: master
- Last Pushed: 2016-09-07T13:56:45.000Z (over 9 years ago)
- Last Synced: 2025-04-07T11:11:10.598Z (9 months ago)
- Language: PHP
- Size: 69.3 KB
- Stars: 1
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README

CloudFactory is a nice and powerful library for building crawlers in PHP. It provides a simple, human readable API while keeping all
power of cURL under the hood. Key feature of CloudFactory is Request Validation. When using proxies in multi-threaded environment
you will often get lots of problems with timeouts, response spoofing, HTTP/proxy errors and invalid content.
The goal of CloudFactory is to minimize the effort needed for 100% valid content delivery, and you can achieve that by
using this library.
###Requirements
- PHP 7.0
- php-curl
- mbstring
- guzzle
### Installation
`composer require exfriend/cloudfactory`
## Basic Usage
```
run($request);
print_r($request->response);
```
## Features
### Setting curl options
```
$request = (new \Exfriend\CloudFactory\Request('http://httpbin.org/get'))->setOpt(CURLOPT_FOLLOWLOCATION, true);
```
### Option setters
You can set raw curl options as well as use some helpers available on `Request` object.
```
$request = (new \Exfriend\CloudFactory\Request('http://httpbin.org/post'))
->setOpt(CURLOPT_FOLLOWLOCATION, true)
->withProxy('127.0.0.1:8080', CURLPROXY_SOCKS5)
->sendHeaders(['Foo' => 'Bar'])
->post(['user' => 'admin', 'pass' => 'secret'])
->withTimeouts(10, 5)
->withSsl();
```
You can set almost all of the options directly on the `Engine` instance:
```
$engine = new Engine()->withUserAgent()->withSsl()->withCookies('cookie.txt');
```
### Concurrent requests
```
$engine = (new \Exfriend\CloudFactory\Engine())->setThreads(25);
for ($i = 0; $i < 100; $i++) {
$request = new \Exfriend\CloudFactory\Request('http://httpbin.org/get?i=' . $i);
$engine->addRequest($request);
}
$engine->run();
foreach ($engine->requests->processed() as $request) {
print_r($request->response);
}
```
### Request priority
You can set priority for each request by using `->withPriority(int)`.
This way when it comes to pulling the request from the queue, they will be sorted by priority.
The more priority you set, the higher your request will be in the queue.
### Request callbacks
```
$request = (new Request('http://httpbin.org/get'))
->maxTries(3)
->store('i', $i)// to pass through to the callback
->validateUsing(function (Request $request) { // using closure
// must return boolean
return strpos($request->response, 'origin') !== false;
})
->onSuccess('request_success')// using string
->onSuccess(function ($r) {
echo 'You can stack callbacks of one type' . PHP_EOL;
})
->onFail(function ($r) {
echo "Request failed {$r->tries_current} times of {$r->tries_max}: " . $r->url;
})
->onLastFail(function ($r) {
echo "Request failed last time: " . $r->url;
});
```
### Response validation
You can validate your responses using `validateWith( callable )`
when building a request. Your callable should accept the instance of `Request` as a parameter
and return boolean true or false as a validity indicator.
Depending on the validity, engine will call callbacks provided with `onSuccess` or `onFail` or `onLastFail`.
You can set `maxTries( int )` to inform the engine how many times the request
should be repeated before it fails the last time.
Note that on the last failed try both onFail and onLastFail callback groups will be called.
### Passing additional data
You can use `store(key,value)` method when creating the request to
pass any additional data on the storage bag of your request.
You can access it from callback like this:
```
...
->store( 'user', $user )
->onSuccess(function($request)){
$user = $request->storage->get('user');
})
...
```
### Dynamically add requests
You can add new requests while engine is still running e.g. from callback.
```
->onSuccess(function($r)use($engine){
$newRequest = new Request('http://site.com/simething_else')
->withSsl()
->onSuccess('some_other_callback');
$engine->addRequest($newRequest);
})
```
### Request collection
When you add a request with `$engine->addRequest()` it gets pushed to `$engine->requests` collection.
It extends `Illuminate\Support\Collection` which is the powerful
collection pattern implementation used in Laravel. This means you have all
the perks like `$htmls = $engine->requests()->processed()->pluck('response')` to get all responses as array.
Full documentation on collections can be found on the [Official Laravel Documentation](https://laravel.com/docs/5.2/collections) pages.
### Callback stacking
You can process the results using `$engine->requests` collection or/and using callbacks.
You can add multiple callbacks for each state.
```
...
->onSuccess('parse_content_and_save_it_to_db')
->onSuccess('notify_other_class')
->onFail('show_notification')
->onFail('change_proxy')
->onLastFail([$this,'send_some_email'])
...
```
This gives you a possibility to easily write for example proxy
rotating trait for your own descendant of `Request` class.
## Examples
For more examples check out the `./examples` directory.
###Contributing
This package is work-in-progress.