https://github.com/kkamara/php-scraper
:office: (Live Link) (2022) Use PHP technologies to crawl and click buttons on websites with GUI. I highly recommend working with Linux (including virtual machines) or MacOs. Laravel 11.
https://github.com/kkamara/php-scraper
bot crawler laravel scraper spider
Last synced: 2 months ago
JSON representation
:office: (Live Link) (2022) Use PHP technologies to crawl and click buttons on websites with GUI. I highly recommend working with Linux (including virtual machines) or MacOs. Laravel 11.
- Host: GitHub
- URL: https://github.com/kkamara/php-scraper
- Owner: kkamara
- Created: 2022-10-24T18:19:05.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2025-02-24T20:05:16.000Z (over 1 year ago)
- Last Synced: 2025-02-24T20:36:00.816Z (over 1 year ago)
- Topics: bot, crawler, laravel, scraper, spider
- Language: PHP
- Homepage: https://github.com/kkamara/php-scraper/actions
- Size: 16.8 MB
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README

# PhP Scraper [](https://github.com/kkamara/php-scraper/actions/workflows/build.yml)
(2022) Use PHP technologies to crawl and click buttons on websites with GUI. I highly recommend working with Linux (including virtual machines) or MacOs. Laravel 13.
* [Important note:](#note)
* [Using Postman?](#postman)
* [Requirements](#requirements)
* [Installation](#installation)
* [Usage](#usage)
* [Adding a new command](#adding-commands)
* [Browser Testing](#testing)
* [Misc](#misc)
* [Contributing](#contributing)
* [License](#license)
Before you try to scrape any website, go through its robots.txt file. You can access it via `domainname/robots.txt`. There, you will see a list of pages allowed and disallowed for scraping. You should not violate any terms of service of any website you scrape.
[Postman client](https://www.postman.com/).
[Published Postman API Collection](https://documenter.getpostman.com/view/17125932/TzzAKvVe).
## Requirements
* [https://laravel.com/docs](https://laravel.com/docs)
* [Java](https://www.java.com/en/)
## Installation
```bash
cp .env.example .env
# Don't worry when the following step errors related to chromedriver binary, we will install them right after.
composer install
```
#### Add chromedriver to Path
Make sure Chromedriver is installed and added to your environment Path.
```bash
# install chromedriver for Panther client.
vendor/bin/bdi detect drivers
sudo mv drivers/chromedriver /usr/local/bin/chromedriver
# Or
# chromedriver_mac64
# chromedriver_win32
# See https://chromedriver.storage.googleapis.com
# for drivers list.
wget https://chromedriver.storage.googleapis.com/2.37/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
sudo mv chromedriver /usr/local/bin/chromedriver
chromedriver --version
```
#### Continue installation
```bash
composer install
php artisan key:generate
# Before running the next command:
# Update your database details in .env
php artisan migrate --seed
yarn install
yarn build
```
#### Download Selenium Server jar file
[Download Selenium Server jar file](https://www.selenium.dev/documentation/grid/getting_started/).
Run the following in a new terminal.
```bash
java -jar selenium-server-4.29.0.jar standalone --override-max-sessions true --max-sessions 10
```
[CLI options in the Selenium Grid](https://www.selenium.dev/documentation/grid/configuration/cli_options/).
## Usage
Update the command at [./app/Console/Commands/BrowserScrape.php](https://raw.githubusercontent.com/kkamara/php-scraper/develop/app/Console/Commands/BrowserScrape.php)
```bash
php artisan browser:scrape
```
[BrowserInvoker.php](https://raw.githubusercontent.com/kkamara/php-scraper/develop/app/Console/Commands/BrowserInvoker.php)
#### Panther Environment Variables
[Panther Environment Variables](https://github.com/symfony/panther?tab=readme-ov-file#environment-variables).
#### Capabilities
[Capabilities](https://www.browserstack.com/docs/automate/capabilities).
[Using Desired Capabilities](https://chromedriver.chromium.org/capabilities#h.p_ID_52).
```bash
php artisan make:crawler TestCrawler
```
## Misc
[See Python Selenium web scraper.](https://github.com/kkamara/python-selenium)
[See MRVL Desktop.](https://github.com/kkamara/mrvl-desktop)
[See PHP ReactJS Boilerplate.](https://github.com/kkamara/php-reactjs-boilerplate)
[See PHP Docker Skeleton.](https://github.com/kkamara/php-docker-skeleton)
[See Python Docker Skeleton.](https://github.com/kkamara/python-docker-skeleton)
## Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.
## License
[BSD](https://opensource.org/licenses/BSD-3-Clause)