https://github.com/maxlen/webcrawler
Search engines crawlers
https://github.com/maxlen/webcrawler
Last synced: 5 months ago
JSON representation
Search engines crawlers
- Host: GitHub
- URL: https://github.com/maxlen/webcrawler
- Owner: maxlen
- Created: 2016-11-23T11:52:50.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2017-03-29T13:46:58.000Z (over 8 years ago)
- Last Synced: 2025-01-12T07:35:56.380Z (6 months ago)
- Language: PHP
- Size: 10.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# webcrawler
Search engines crawlers### Google example:
```
$proxy = []; //['host' => '*.*.*.*', 'port' => '', 'login' => '', 'password' => '']
$params = ['query' => 'test search', 'page' => $page, 'proxy' => $proxy];
$crawler = new WebCrawler(['strategy' => new GoogleSearch()]);
print_r($crawler->crawl($params));
```### Site-parse example:
```
$params = ['url' => 'http://your-site.com', 'proxy' => []];
$crawler = new WebCrawler(['strategy' => new SiteSearch()]);
print_r($crawler->crawl($params));
```