{"id":22243009,"url":"https://github.com/mediamonks/crawler","last_synced_at":"2025-07-28T01:32:17.066Z","repository":{"id":62526470,"uuid":"74649243","full_name":"mediamonks/crawler","owner":"mediamonks","description":"Crawl your own website with various clients for SEO and indexing purposes.","archived":false,"fork":false,"pushed_at":"2017-12-04T15:09:41.000Z","size":41,"stargazers_count":19,"open_issues_count":0,"forks_count":4,"subscribers_count":9,"default_branch":"master","last_synced_at":"2024-08-09T17:54:32.395Z","etag":null,"topics":["browserkit","crawler","crawling","php","prerender","prerenderio","seo","spider"],"latest_commit_sha":null,"homepage":null,"language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mediamonks.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-11-24T07:38:03.000Z","updated_at":"2022-03-02T14:08:42.000Z","dependencies_parsed_at":"2022-11-02T14:16:11.720Z","dependency_job_id":null,"html_url":"https://github.com/mediamonks/crawler","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mediamonks%2Fcrawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mediamonks%2Fcrawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mediamonks%2Fcrawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mediamonks%2Fcrawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mediamonks","download_url":"https://codeload.github.com/mediamonks/crawler/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":227850890,"owners_count":17829246,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["browserkit","crawler","crawling","php","prerender","prerenderio","seo","spider"],"created_at":"2024-12-03T04:19:30.839Z","updated_at":"2024-12-03T04:19:31.542Z","avatar_url":"https://github.com/mediamonks.png","language":"PHP","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Build Status](https://travis-ci.org/mediamonks/crawler.svg?branch=master)](https://travis-ci.org/mediamonks/crawler)\n[![Scrutinizer Code Quality](https://scrutinizer-ci.com/g/mediamonks/crawler/badges/quality-score.png?b=master)](https://scrutinizer-ci.com/g/mediamonks/crawler/?branch=master)\n[![Code Coverage](https://scrutinizer-ci.com/g/mediamonks/crawler/badges/coverage.png?b=master)](https://scrutinizer-ci.com/g/mediamonks/crawler/?branch=master)\n[![Total Downloads](https://poser.pugx.org/mediamonks/crawler/downloads)](https://packagist.org/packages/mediamonks/crawler)\n[![Latest Stable Version](https://poser.pugx.org/mediamonks/crawler/v/stable)](https://packagist.org/packages/mediamonks/crawler)\n[![Latest Unstable Version](https://poser.pugx.org/mediamonks/crawler/v/unstable)](https://packagist.org/packages/mediamonks/crawler)\n[![SensioLabs Insight](https://img.shields.io/sensiolabs/i/2fd407ee-3228-46c1-9ebb-40745787d454.svg)](https://insight.sensiolabs.com/projects/2fd407ee-3228-46c1-9ebb-40745787d454)\n[![License](https://poser.pugx.org/mediamonks/crawler/license)](https://packagist.org/packages/mediamonks/crawler)\n\n# MediaMonks Crawler\n\nThis tool allows you to easily crawl a website and get a DOM object for every url that was found.\nWe use this to crawl our own site pages regardless if it was generated with server and/or client side content by using the Prerender.io client.\nThe resulting data can be used for creating a full site search and/or improving SEO for single-page applications.\n\n## Highlights\n\n- Ships with Prerender \u0026 Prerender.io clients, uses Goutte by default\n- Supports any Symfony BrowserKit client\n- Supports both whitelisting and blacklisting of urls\n- Supports url normalization which allow you to prevent duplicates based on minor url differences\n- Implements the [PSR-3 Logger Interface](http://www.php-fig.org/psr/psr-3/)\n\n## Documentation\n\nDocumentation and examples can be found in the [/doc](/doc) folder.\n\n## System Requirements\n\nYou need:\n\n- **PHP \u003e= 5.5.0**\n\nTo use the library.\n\n## Install\n\nInstall this package by using Composer.\n\n```\n$ composer require mediamonks/crawler\n```\n\n## Security\n\nIf you discover any security related issues, please email devmonk@mediamonks.com instead of using the issue tracker.\n\n## License\n\nThe MIT License (MIT). Please see [License File](LICENSE) for more information.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmediamonks%2Fcrawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmediamonks%2Fcrawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmediamonks%2Fcrawler/lists"}