An open API service indexing awesome lists of open source software.

https://github.com/kkamara/php-scraper

:office: (Live Link) (2022) Use PHP technologies to crawl and click buttons on websites with GUI. I highly recommend working with Linux (including virtual machines) or MacOs. Laravel 11.
https://github.com/kkamara/php-scraper

bot crawler laravel scraper spider

Last synced: 2 months ago
JSON representation

:office: (Live Link) (2022) Use PHP technologies to crawl and click buttons on websites with GUI. I highly recommend working with Linux (including virtual machines) or MacOs. Laravel 11.

Awesome Lists containing this project

README

          

php-scraper.gif

# PhP Scraper [![API](https://github.com/kkamara/php-scraper/actions/workflows/build.yml/badge.svg)](https://github.com/kkamara/php-scraper/actions/workflows/build.yml)

(2022) Use PHP technologies to crawl and click buttons on websites with GUI. I highly recommend working with Linux (including virtual machines) or MacOs. Laravel 13.

* [Important note:](#note)

* [Using Postman?](#postman)

* [Requirements](#requirements)

* [Installation](#installation)

* [Usage](#usage)

* [Adding a new command](#adding-commands)

* [Browser Testing](#testing)

* [Misc](#misc)

* [Contributing](#contributing)

* [License](#license)

## Important note:

Before you try to scrape any website, go through its robots.txt file. You can access it via `domainname/robots.txt`. There, you will see a list of pages allowed and disallowed for scraping. You should not violate any terms of service of any website you scrape.


## Using Postman?

[Postman client](https://www.postman.com/).

[Published Postman API Collection](https://documenter.getpostman.com/view/17125932/TzzAKvVe).

## Requirements

* [https://laravel.com/docs](https://laravel.com/docs)
* [Java](https://www.java.com/en/)

## Installation

```bash
cp .env.example .env
# Don't worry when the following step errors related to chromedriver binary, we will install them right after.
composer install
```

#### Add chromedriver to Path

Make sure Chromedriver is installed and added to your environment Path.

```bash
# install chromedriver for Panther client.
vendor/bin/bdi detect drivers
sudo mv drivers/chromedriver /usr/local/bin/chromedriver
# Or
# chromedriver_mac64
# chromedriver_win32
# See https://chromedriver.storage.googleapis.com
# for drivers list.
wget https://chromedriver.storage.googleapis.com/2.37/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
sudo mv chromedriver /usr/local/bin/chromedriver
chromedriver --version
```

#### Continue installation

```bash
composer install
php artisan key:generate
# Before running the next command:
# Update your database details in .env
php artisan migrate --seed
yarn install
yarn build
```

#### Download Selenium Server jar file

[Download Selenium Server jar file](https://www.selenium.dev/documentation/grid/getting_started/).

Run the following in a new terminal.

```bash
java -jar selenium-server-4.29.0.jar standalone --override-max-sessions true --max-sessions 10
```

[CLI options in the Selenium Grid](https://www.selenium.dev/documentation/grid/configuration/cli_options/).

## Usage

Update the command at [./app/Console/Commands/BrowserScrape.php](https://raw.githubusercontent.com/kkamara/php-scraper/develop/app/Console/Commands/BrowserScrape.php)

```bash
php artisan browser:scrape
```

[BrowserInvoker.php](https://raw.githubusercontent.com/kkamara/php-scraper/develop/app/Console/Commands/BrowserInvoker.php)

#### Panther Environment Variables

[Panther Environment Variables](https://github.com/symfony/panther?tab=readme-ov-file#environment-variables).

#### Capabilities

[Capabilities](https://www.browserstack.com/docs/automate/capabilities).

[Using Desired Capabilities](https://chromedriver.chromium.org/capabilities#h.p_ID_52).

## Adding a new command

```bash
php artisan make:crawler TestCrawler
```

## Misc

[See Python Selenium web scraper.](https://github.com/kkamara/python-selenium)

[See MRVL Desktop.](https://github.com/kkamara/mrvl-desktop)

[See PHP ReactJS Boilerplate.](https://github.com/kkamara/php-reactjs-boilerplate)

[See PHP Docker Skeleton.](https://github.com/kkamara/php-docker-skeleton)

[See Python Docker Skeleton.](https://github.com/kkamara/python-docker-skeleton)

## Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

## License
[BSD](https://opensource.org/licenses/BSD-3-Clause)