{"id":15043968,"url":"https://github.com/piopi/behatcrawler","last_synced_at":"2026-02-09T11:04:25.501Z","repository":{"id":57041781,"uuid":"299711805","full_name":"piopi/BehatCrawler","owner":"piopi","description":"A Behat extension that crawls links on a website and executes user-defined function on each one of them.","archived":false,"fork":false,"pushed_at":"2020-10-01T18:24:08.000Z","size":1372,"stargazers_count":1,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-06-09T14:50:19.586Z","etag":null,"topics":["behat","behat-extension","crawler","php","selenium-webdriver"],"latest_commit_sha":null,"homepage":"","language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/piopi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-09-29T19:01:22.000Z","updated_at":"2020-10-01T18:24:10.000Z","dependencies_parsed_at":"2022-08-23T23:40:14.534Z","dependency_job_id":null,"html_url":"https://github.com/piopi/BehatCrawler","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/piopi/BehatCrawler","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/piopi%2FBehatCrawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/piopi%2FBehatCrawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/piopi%2FBehatCrawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/piopi%2FBehatCrawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/piopi","download_url":"https://codeload.github.com/piopi/BehatCrawler/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/piopi%2FBehatCrawler/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268310797,"owners_count":24230185,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-01T02:00:08.611Z","response_time":67,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["behat","behat-extension","crawler","php","selenium-webdriver"],"created_at":"2024-09-24T20:49:53.614Z","updated_at":"2026-02-09T11:04:25.413Z","avatar_url":"https://github.com/piopi.png","language":"PHP","readme":"# BehatCrawler\n\n![PHP Composer](https://github.com/piopi/BehatCrawler/workflows/PHP%20Composer/badge.svg)\n\nThe BehatCrawler is a [Behat](https://github.com/Behat/Behat), [MinkExtension](https://github.com/Behat/MinkExtension) and [Selenium2Driver](https://github.com/minkphp/MinkSelenium2Driver) extension that crawls a given URL and executes user-defined functions in each crawled page. \n\nMultiple options for crawling are available, see [available options](#available).\n\n## Installation\n\n```shell\ncomposer require piopi/behatcrawler\n```\n\n## Usage\n\nStart by importing the extension, to your Feature Context (or any of your Context):\n\n```php\nuse Behat\\Crawler\\Crawler;\n```\n\nCreate your Crawler object with the default configuration:\n\n**The crawler is only compatible at this time with [Selenium2Driver](https://github.com/minkphp/MinkSelenium2Driver)**\n\n```Php\n//$crawler=New Crawler(BehatSession);\n$crawler= New Crawler($this-\u003egetSession());\n```\n\nFor custom settings (passed as an array), see the following table for all the [available options](#available).\n\n```php\n$crawler= New Crawler($this-\u003egetSession(),[\"internalLinksOnly\"=\u003etrue,\"HTMLOnly\"=\u003etrue,'MaxCrawl'=\u003e20]);\n```\n\n#### Available options: (More functionalities coming soon) \u003ca name=\"available\"\u003e\u003c/a\u003e\n\n| Option            | Description                                                  | Default Value |\n| ----------------- | ------------------------------------------------------------ | ------------- |\n| Depth             | Maximum depth that can be crawled from URL                   | 0 (unlimited) |\n| MaxCrawl          | Maximum number of crawls                                     | 0 (unlimited) |\n| HTMLOnly          | Will only crawl HTML/xHTML pages                             | true          |\n| internalLinksOnly | Will crawl internal links only (links with same Domaine name as the initial URL) | true          |\n| waitForCrawl      | Will wait for the crawler to finish crawling before throwing any exception originating from the user defined functions. (Compile a list of all exceptions found with their respective location) | false         |\n\n**Option can either be set in the constructor or with the appropriate getters/setters:**\n\n```Php\n $crawler= New Crawler($this-\u003egetSession(),[\"MaxCrawl\"=\u003e10]);\n//or\n$crawler-\u003esetMaximumCrawl(10);\n```\n\n#### Start Crawling\n\nAfter creating and setting up the crawler, you can start crawling by passing your function as an argument:\n\nPlease refer to the PHP [Callables documentation](https://www.php.net/manual/en/language.types.callable.php) for more details.\n\n**Examples**:\n\n\u003e Closure::fromCallable is used to pass by parameter private function\n\n```php\n//function 1 is a private function\n$crawler-\u003estartCrawling(Closure::fromCallable([$this, 'function1']));\n//function 2 is a public class function\n$crawler-\u003estartCrawling([$this, 'function1']);\n```\n\nFor functions with one or more arguments, they can be passed as the following:\n\n```Php\n$crawler-\u003estartCrawling(Closure::fromCallable([$this, 'function3']),[arg1]);\n$crawler-\u003estartCrawling(Closure::fromCallable([$this, 'function4']),[arg1,arg2]);\n```\n\n### Usage Example\n\n```php\nuse Behat\\Crawler\\Crawler;\n//Crawler with different settings\n$crawler= New Crawler($this-\u003egetSession(),[\"internalLinksOnly\"=\u003etrue,\"HTMLOnly\"=\u003etrue,'MaxCrawl'=\u003e20,\"waitForCrawl\"=\u003etrue]);\n//Function without arguments\n$crawler-\u003estartCrawling(Closure::fromCallable([$this, 'function1'])); //Will start crawling\n//Function with one or more argument\n$crawler-\u003estartCrawling(Closure::fromCallable([$this, 'function2']),[arg1,arg2]);\n\n```\n\n**In a Behat step function:**\n\n```Php\n   /**\n     * @Given /^I crawl the website with a maximum of (\\d+) level$/\n     */\n    public function iCrawlTheWebsiteWithAMaximumOfLevel($arg1)\n    {\n        $crawler= New Crawler($this-\u003egetSession(),[\"Depth\"=\u003e$arg1]);\n        $crawler-\u003estartCrawling([$this, 'test']);\n    }\n```\n\n### Copyright\n\nCopyright (c) 2020 Mostapha El Sabah elsabah.mostapha@gmail.com\n\n## Maintainers\n\nMostapha El Sabah [Piopi](https://github.com/piopi)\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpiopi%2Fbehatcrawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpiopi%2Fbehatcrawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpiopi%2Fbehatcrawler/lists"}