https://github.com/rakshazi/multigrabber
Special combination of PicoFeed parser and MCurl These libraries allow Multigrabber download content from multiple urls in parallel requests and parse it with PicoFeed parser (best html parser ever).
https://github.com/rakshazi/multigrabber
Last synced: 3 months ago
JSON representation
Special combination of PicoFeed parser and MCurl These libraries allow Multigrabber download content from multiple urls in parallel requests and parse it with PicoFeed parser (best html parser ever).
- Host: GitHub
- URL: https://github.com/rakshazi/multigrabber
- Owner: rakshazi
- Created: 2016-09-11T21:28:39.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2016-09-11T21:29:53.000Z (almost 9 years ago)
- Last Synced: 2025-01-21T07:11:34.447Z (5 months ago)
- Language: PHP
- Homepage:
- Size: 1.95 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Multigrabber
> Special combination of [PicoFeed parser](https://github.com/fguillot/picoFeed) and [MCurl](https://github.com/KhristenkoYura/mcurl)
> These libraries allow Multigrabber download content from multiple urls in parallel requests and parse it with PicoFeed parser (best html parser ver).Test results (100 urls, multiple sites): 64 sec and 0.36MB RAM for download and parse all content.
### Installation
```
composer require rakshazi/multigrabber
```### Usage
```php
setGrabberRulesFolder(__DIR__ . '/rules'); //PicoFeed grabber rules, @link https://github.com/fguillot/picoFeed/blob/master/docs/feed-parsing.markdown#custom-regex-filters
$config->setClientUserAgent('Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36');
$grabber = new \Rakshazi\Multigrabber($config);
$urls = ['http://example.site/1', 'https://example.site/post2', '...'];
$data = $grabber->run($urls);var_dump($data);
```Output:
```
"
array(2) {
["http://example.site/1"]=>
string(978) "Parsed content from nat-geo.ru (text was removed in this example) Vert Dider.
["https://example.site/post2"]=>
string(3675) "Parsed html"
}```