{"id":20709982,"url":"https://github.com/oxylabs/web-scraping-php","last_synced_at":"2025-06-15T14:06:13.432Z","repository":{"id":40586183,"uuid":"464486297","full_name":"oxylabs/web-scraping-php","owner":"oxylabs","description":"A tutorial and code samples of web scraping with PHP","archived":false,"fork":false,"pushed_at":"2025-02-11T12:47:11.000Z","size":27,"stargazers_count":9,"open_issues_count":1,"forks_count":3,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-03-29T22:12:04.109Z","etag":null,"topics":["email-scraper","email-scraper-with-proxy","php","screen-scraping","url-scraper","web-scraping","website-crawler","wikipedia-scraper"],"latest_commit_sha":null,"homepage":"","language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/oxylabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2022-02-28T13:07:58.000Z","updated_at":"2025-02-11T12:47:16.000Z","dependencies_parsed_at":"2024-04-19T12:29:44.894Z","dependency_job_id":"4b1d21cc-1b59-4d46-907b-f1d63c2cc286","html_url":"https://github.com/oxylabs/web-scraping-php","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fweb-scraping-php","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fweb-scraping-php/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fweb-scraping-php/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fweb-scraping-php/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/oxylabs","download_url":"https://codeload.github.com/oxylabs/web-scraping-php/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250372939,"owners_count":21419722,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["email-scraper","email-scraper-with-proxy","php","screen-scraping","url-scraper","web-scraping","website-crawler","wikipedia-scraper"],"created_at":"2024-11-17T02:09:27.354Z","updated_at":"2025-04-23T04:48:00.089Z","avatar_url":"https://github.com/oxylabs.png","language":"PHP","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Web Scraping With PHP \n\n[![Oxylabs promo code](https://raw.githubusercontent.com/oxylabs/product-integrations/refs/heads/master/Affiliate-Universal-1090x275.png)](https://oxylabs.go2cloud.org/aff_c?offer_id=7\u0026aff_id=877\u0026url_id=112)\n\n\n[\u003cimg src=\"https://img.shields.io/static/v1?label=\u0026message=PHP\u0026color=brightgreen\" /\u003e](https://github.com/topics/php) [\u003cimg src=\"https://img.shields.io/static/v1?label=\u0026message=Web%20Scraping\u0026color=important\" /\u003e](https://github.com/topics/web-scraping) \n\n[![](https://dcbadge.vercel.app/api/server/eWsVUJrnG5)](https://discord.gg/GbxmdGhZjq)\n\n- [Installing Prerequisites](#installing-prerequisites)\n- [Making an HTTP GET request](#making-an-http-get-request)\n- [Web scraping in PHP with Goutte](#web-scraping-in-php-with-goutte)\n- [Web scraping with Symfony Panther](#web-scraping-with-symfony-panther)\n\nPHP is a general-purpose scripting language and one of the most popular options for web development. For example, WordPress, the most common content management system to create websites, is built using PHP.\n\nPHP offers various building blocks required to build a web scraper, although it can quickly become an increasingly complicated task. Conveniently, there are many open-source libraries that can make web scraping with PHP more accessible.\n\nThis article will guide you through the step-by-step process of writing various PHP web scraping routines that can extract public data from static and dynamic web pages\n\nFor a detailed explanation, see our [blog post](https://oxy.yt/Jr3d).\n\n## Installing Prerequisites\n\n```sh\n# Windows\nchoco install php\nchoco install composer\n```\n\nor \n\n```sh\n# macOS\nbrew install php\nbrew install composer\n```\n\n## Making an HTTP GET request\n\n```php\n\u003c?php\n$html = file_get_contents('https://books.toscrape.com/');\necho $html;\n```\n\n## Web scraping in PHP with Goutte\n\n```sh\ncomposer init --no-interaction --require=\"php \u003e=7.1\"\ncomposer require fabpot/goutte\ncomposer update\n```\n\n```php\n\u003c?php\nrequire 'vendor/autoload.php';\nuse Goutte\\Client;\n$client = new Client();\n$crawler = $client-\u003erequest('GET', 'https://books.toscrape.com');\necho $crawler-\u003ehtml();\n```\n\n### Locating HTML elements via CSS Selectors\n\n```php\necho $crawler-\u003efilter('title')-\u003etext(); //CSS\necho $crawler-\u003efilterXPath('//title')-\u003etext(); //XPath\n```\n\n### Extracting the elements\n\n```php\nfunction scrapePage($url, $client){\n    $crawler = $client-\u003erequest('GET', $url);\n    $crawler-\u003efilter('.product_pod')-\u003eeach(function ($node) {\n            $title = $node-\u003efilter('.image_container img')-\u003eattr('alt');\n            $price = $node-\u003efilter('.price_color')-\u003etext();\n            echo $title . \"-\" . $price . PHP_EOL;\n        });\n    }\n```\n\n\n\n### Handling pagination\n\n```php\nfunction scrapePage($url, $client, $file)\n{\n   //...\n  // Handling Pagination\n    try {\n        $next_page = $crawler-\u003efilter('.next \u003e a')-\u003eattr('href');\n    } catch (InvalidArgumentException) { //Next page not found\n        return null;\n    }\n    return \"https://books.toscrape.com/catalogue/\" . $next_page;\n}\n```\n\n### Writing Data to CSV\n\n```php\nfunction scrapePage($url, $client, $file)\n{\n    $crawler = $client-\u003erequest('GET', $url);\n    $crawler-\u003efilter('.product_pod')-\u003eeach(function ($node) use ($file) {\n        $title = $node-\u003efilter('.image_container img')-\u003eattr('alt');\n        $price = $node-\u003efilter('.price_color')-\u003etext();\n        fputcsv($file, [$title, $price]);\n    });\n    try {\n        $next_page = $crawler-\u003efilter('.next \u003e a')-\u003eattr('href');\n    } catch (InvalidArgumentException) { //Next page not found\n        return null;\n    }\n    return \"https://books.toscrape.com/catalogue/\" . $next_page;\n}\n$client = new Client();\n$file = fopen(\"books.csv\", \"a\");\n$nextUrl = \"https://books.toscrape.com/catalogue/page-1.html\";\nwhile ($nextUrl) {\n    echo \"\u003ch2\u003e\" . $nextUrl . \"\u003c/h2\u003e\" . PHP_EOL;\n    $nextUrl = scrapePage($nextUrl, $client, $file);\n}\nfclose($file);\n```\n\n\n\n## Web scraping with Symfony Panther\n\n```sh\ncomposer init --no-interaction --require=\"php \u003e=7.1\" \ncomposer require symfony/panther\ncomposer update\nbrew install chromedriver\n```\n\n### Sending HTTP requests with Panther\n\n```php\n\u003c?php\nrequire 'vendor/autoload.php';\nuse \\Symfony\\Component\\Panther\\Client;\n$client = Client::createChromeClient();\n$client-\u003eget('https://quotes.toscrape.com/js/');\n```\n\n### Locating HTML elements via CSS Selectors\n\n```php\n    $crawler = $client-\u003ewaitFor('.quote');\n    $crawler-\u003efilter('.quote')-\u003eeach(function ($node) {\n        $author = $node-\u003efilter('.author')-\u003etext();\n        $quote = $node-\u003efilter('.text')-\u003etext();\n       echo $autor.\" - \".$quote\n    });\n```\n\n### Handling pagination\n\n```php\nwhile (true) {\n    $crawler = $client-\u003ewaitFor('.quote');\n…\n    try {\n        $client-\u003eclickLink('Next');\n    } catch (Exception) {\n        break;\n    }\n}\n```\n\n### Writing data to a CSV file\n\n```php\n$file = fopen(\"quotes.csv\", \"a\");\nwhile (true) {\n    $crawler = $client-\u003ewaitFor('.quote');\n    $crawler-\u003efilter('.quote')-\u003eeach(function ($node) use ($file) {\n        $author = $node-\u003efilter('.author')-\u003etext();\n        $quote = $node-\u003efilter('.text')-\u003etext();\n        fputcsv($file, [$author, $quote]);\n    });\n    try {\n        $client-\u003eclickLink('Next');\n    } catch (Exception) {\n        break;\n    }\n}\nfclose($file);\n```\n\n\n\nIf you wish to find out more about web scraping with PHP, see our [blog post](https://oxy.yt/Jr3d).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foxylabs%2Fweb-scraping-php","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foxylabs%2Fweb-scraping-php","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foxylabs%2Fweb-scraping-php/lists"}