An open API service indexing awesome lists of open source software.

https://github.com/rakshazi/multigrabber

Special combination of PicoFeed parser and MCurl These libraries allow Multigrabber download content from multiple urls in parallel requests and parse it with PicoFeed parser (best html parser ever).
https://github.com/rakshazi/multigrabber

Last synced: 3 months ago
JSON representation

Special combination of PicoFeed parser and MCurl These libraries allow Multigrabber download content from multiple urls in parallel requests and parse it with PicoFeed parser (best html parser ever).

Awesome Lists containing this project

README

        

# Multigrabber

> Special combination of [PicoFeed parser](https://github.com/fguillot/picoFeed) and [MCurl](https://github.com/KhristenkoYura/mcurl)
> These libraries allow Multigrabber download content from multiple urls in parallel requests and parse it with PicoFeed parser (best html parser ver).

Test results (100 urls, multiple sites): 64 sec and 0.36MB RAM for download and parse all content.

### Installation

```
composer require rakshazi/multigrabber
```

### Usage

```php
setGrabberRulesFolder(__DIR__ . '/rules'); //PicoFeed grabber rules, @link https://github.com/fguillot/picoFeed/blob/master/docs/feed-parsing.markdown#custom-regex-filters
$config->setClientUserAgent('Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36');
$grabber = new \Rakshazi\Multigrabber($config);
$urls = ['http://example.site/1', 'https://example.site/post2', '...'];
$data = $grabber->run($urls);

var_dump($data);
```

Output:

```
array(2) {
["http://example.site/1"]=>
string(978) "Parsed content from nat-geo.ru (text was removed in this example) Vert Dider.

"
["https://example.site/post2"]=>
string(3675) "Parsed html"
}

```