Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mediamonks/crawler
Crawl your own website with various clients for SEO and indexing purposes.
https://github.com/mediamonks/crawler
browserkit crawler crawling php prerender prerenderio seo spider
Last synced: 20 days ago
JSON representation
Crawl your own website with various clients for SEO and indexing purposes.
- Host: GitHub
- URL: https://github.com/mediamonks/crawler
- Owner: mediamonks
- License: mit
- Created: 2016-11-24T07:38:03.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2017-12-04T15:09:41.000Z (about 7 years ago)
- Last Synced: 2024-08-09T17:54:32.395Z (5 months ago)
- Topics: browserkit, crawler, crawling, php, prerender, prerenderio, seo, spider
- Language: PHP
- Size: 40 KB
- Stars: 19
- Watchers: 9
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
[![Build Status](https://travis-ci.org/mediamonks/crawler.svg?branch=master)](https://travis-ci.org/mediamonks/crawler)
[![Scrutinizer Code Quality](https://scrutinizer-ci.com/g/mediamonks/crawler/badges/quality-score.png?b=master)](https://scrutinizer-ci.com/g/mediamonks/crawler/?branch=master)
[![Code Coverage](https://scrutinizer-ci.com/g/mediamonks/crawler/badges/coverage.png?b=master)](https://scrutinizer-ci.com/g/mediamonks/crawler/?branch=master)
[![Total Downloads](https://poser.pugx.org/mediamonks/crawler/downloads)](https://packagist.org/packages/mediamonks/crawler)
[![Latest Stable Version](https://poser.pugx.org/mediamonks/crawler/v/stable)](https://packagist.org/packages/mediamonks/crawler)
[![Latest Unstable Version](https://poser.pugx.org/mediamonks/crawler/v/unstable)](https://packagist.org/packages/mediamonks/crawler)
[![SensioLabs Insight](https://img.shields.io/sensiolabs/i/2fd407ee-3228-46c1-9ebb-40745787d454.svg)](https://insight.sensiolabs.com/projects/2fd407ee-3228-46c1-9ebb-40745787d454)
[![License](https://poser.pugx.org/mediamonks/crawler/license)](https://packagist.org/packages/mediamonks/crawler)# MediaMonks Crawler
This tool allows you to easily crawl a website and get a DOM object for every url that was found.
We use this to crawl our own site pages regardless if it was generated with server and/or client side content by using the Prerender.io client.
The resulting data can be used for creating a full site search and/or improving SEO for single-page applications.## Highlights
- Ships with Prerender & Prerender.io clients, uses Goutte by default
- Supports any Symfony BrowserKit client
- Supports both whitelisting and blacklisting of urls
- Supports url normalization which allow you to prevent duplicates based on minor url differences
- Implements the [PSR-3 Logger Interface](http://www.php-fig.org/psr/psr-3/)## Documentation
Documentation and examples can be found in the [/doc](/doc) folder.
## System Requirements
You need:
- **PHP >= 5.5.0**
To use the library.
## Install
Install this package by using Composer.
```
$ composer require mediamonks/crawler
```## Security
If you discover any security related issues, please email [email protected] instead of using the issue tracker.
## License
The MIT License (MIT). Please see [License File](LICENSE) for more information.