Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/danielemoraschi/sitemap-common
Simple PHP Sitemap generator and crawler library.
https://github.com/danielemoraschi/sitemap-common
crawler php php-library php-sitemap-generator sitemap
Last synced: 9 days ago
JSON representation
Simple PHP Sitemap generator and crawler library.
- Host: GitHub
- URL: https://github.com/danielemoraschi/sitemap-common
- Owner: danielemoraschi
- License: mit
- Created: 2016-08-18T11:53:46.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2016-08-21T23:09:31.000Z (over 8 years ago)
- Last Synced: 2024-12-08T10:45:58.619Z (about 1 month ago)
- Topics: crawler, php, php-library, php-sitemap-generator, sitemap
- Language: PHP
- Homepage:
- Size: 54.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# A PHP sitemap generator and crawler
##[![Build Status](https://travis-ci.org/danielemoraschi/sitemap-common.png?branch=master)](https://travis-ci.org/danielemoraschi/sitemap-common)
[![Scrutinizer Quality Score](https://scrutinizer-ci.com/g/danielemoraschi/sitemap-common/badges/quality-score.png?b=master)](https://scrutinizer-ci.com/g/danielemoraschi/sitemap-common/)This package provides all of the components to crawl a website and build and write sitemaps file.
Example of console application using the library: [dmoraschi/sitemap-app](https://github.com/danielemoraschi/sitemap-app)
## Installation
Run the following command and provide the latest stable version (e.g v1.0.0):
```bash
composer require dmoraschi/sitemap-common
```or add the following to your `composer.json` file :
```json
"dmoraschi/sitemap-common": "1.0.*"
```````SiteMapGenerator`
-----
**Basic usage**``` php
$generator = new SiteMapGenerator(
new FileWriter($outputFileName),
new XmlTemplate()
);
```Add a URL:
``` php
$generator->addUrl($url, $frequency, $priority);
```Add a single `SiteMapUrl` object or array:
``` php
$siteMapUrl = new SiteMapUrl(
new Url($url), $frequency, $priority
);$generator->addSiteMapUrl($siteMapUrl);
$generator->addSiteMapUrls([
$siteMapUrl, $siteMapUrl2
]);
```Set the URLs of the sitemap via `SiteMapUrlCollection`:
``` php
$siteMapUrl = new SiteMapUrl(
new Url($url), $frequency, $priority
);$collection = new SiteMapUrlCollection([
$siteMapUrl, $siteMapUrl2
]);$generator->setCollection($collection);
```Generate the sitemap:
``` php
$generator->execute();
````Crawler`
-----
**Basic usage**``` php
$crawler = new Crawler(
new Url($baseUrl),
new RegexBasedLinkParser(),
new HttpClient()
);
```You can tell the `Crawler` **not to visit** certain url's by adding policies. Below the default policies provided by the library:
```php
$crawler->setPolicies([
'host' => new SameHostPolicy($baseUrl),
'url' => new UniqueUrlPolicy(),
'ext' => new ValidExtensionPolicy(),
]);
// or
$crawler->setPolicy('host', new SameHostPolicy($baseUrl));
```
`SameHostPolicy`, `UniqueUrlPolicy`, `ValidExtensionPolicy` are provided with the library, you can define your own policies by implementing the interface `Policy`.Calling the function `crawl` the object will start from the base url in the contructor and crawl all the web pages with the specified depth passed as a argument.
The function will return with the array of all unique visited `Url`'s:
```php
$urls = $crawler->crawl($deep);
```You can also instruct the `Crawler` to collect custom data while visiting the web pages by adding `Collector`'s to the main object:
```php
$crawler->setCollectors([
'images' => new ImageCollector()
]);
// or
$crawler->setCollector('images', new ImageCollector());
```
And then retrive the collected data:
```php
$crawler->crawl($deep);$imageCollector = $crawler->getCollector('images');
$data = $imageCollector->getCollectedData();
```
`ImageCollector` is provided by the library, you can define your own collector by implementing the interface `Collector`.