https://github.com/mediamonks/crawler
Crawl your own website with various clients for SEO and indexing purposes.
https://github.com/mediamonks/crawler
browserkit crawler crawling php prerender prerenderio seo spider
Last synced: 9 months ago
JSON representation
Crawl your own website with various clients for SEO and indexing purposes.
- Host: GitHub
- URL: https://github.com/mediamonks/crawler
- Owner: mediamonks
- License: mit
- Created: 2016-11-24T07:38:03.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2017-12-04T15:09:41.000Z (over 8 years ago)
- Last Synced: 2024-08-09T17:54:32.395Z (over 1 year ago)
- Topics: browserkit, crawler, crawling, php, prerender, prerenderio, seo, spider
- Language: PHP
- Size: 40 KB
- Stars: 19
- Watchers: 9
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://travis-ci.org/mediamonks/crawler)
[](https://scrutinizer-ci.com/g/mediamonks/crawler/?branch=master)
[](https://scrutinizer-ci.com/g/mediamonks/crawler/?branch=master)
[](https://packagist.org/packages/mediamonks/crawler)
[](https://packagist.org/packages/mediamonks/crawler)
[](https://packagist.org/packages/mediamonks/crawler)
[](https://insight.sensiolabs.com/projects/2fd407ee-3228-46c1-9ebb-40745787d454)
[](https://packagist.org/packages/mediamonks/crawler)
# MediaMonks Crawler
This tool allows you to easily crawl a website and get a DOM object for every url that was found.
We use this to crawl our own site pages regardless if it was generated with server and/or client side content by using the Prerender.io client.
The resulting data can be used for creating a full site search and/or improving SEO for single-page applications.
## Highlights
- Ships with Prerender & Prerender.io clients, uses Goutte by default
- Supports any Symfony BrowserKit client
- Supports both whitelisting and blacklisting of urls
- Supports url normalization which allow you to prevent duplicates based on minor url differences
- Implements the [PSR-3 Logger Interface](http://www.php-fig.org/psr/psr-3/)
## Documentation
Documentation and examples can be found in the [/doc](/doc) folder.
## System Requirements
You need:
- **PHP >= 5.5.0**
To use the library.
## Install
Install this package by using Composer.
```
$ composer require mediamonks/crawler
```
## Security
If you discover any security related issues, please email devmonk@mediamonks.com instead of using the issue tracker.
## License
The MIT License (MIT). Please see [License File](LICENSE) for more information.