An open API service indexing awesome lists of open source software.

https://github.com/arefshojaei/spider

PHP web spider
https://github.com/arefshojaei/spider

bot crawler crawling php php-library php-tools php8 scraper scrapping spider web web-bot

Last synced: 10 months ago
JSON representation

PHP web spider

Awesome Lists containing this project

README

          




PHP web spider

```php
loadHTML("http://google.com");

echo $page->find("title")->text() . PHP_EOL;

$page->findAll("a")->each(function($key, $link) {
echo "[LINK] " . $link->attr("href") . PHP_EOL;
});
```

## **Installation**

#### Using Composer
```bash
composer create-project arefshojaei/spider
```

#### Using GIT
```bash
git clone https://github.com/ArefShojaei/Spider
```

> Find element
* find()
* findAll()

```php
$page->find("a");

$page->findAll(".product");
```

> Iterate for each eleemnt
* each()
* map()
* filter()

```php
$page->findAll("a")->each(function($key, $anchor) {
echo "[LINK] " . $anchor->attr("href") . PHP_EOL;
echo "[TITLE] " . $anchor->text() . PHP_EOL;
echo "[HTML] " . $anchor->html() . PHP_EOL;
});

# ----------------------------------------
$anchors = $page->findAll("a")->map(function($key, $anchor) {
$anchor->attr("data-id", rand());

return $anchor;
});

var_dump($anchors);

# ----------------------------------------
$filteredAnchors = $page->findAll("a")->filter(function($key, $anchor) => $anchor->attr("data-id"));

var_dump($filteredAnchors);
```

> Element traversing
* parent()
* after()
* before()
* append()
* prepend()

```php
$parentNode = $page->find(".product")->parent();

# Add parent Element
$page->find(".product")->after("

After Element

");
$page->find(".product")->before("

Before Element

");

# Add child (local) element
$page->find(".product")->append("

Append Element

");
$page->find(".product")->prepend("

Prepend Element

");
```

> Element cleaner
* empty()
* remove()

```php
# Clean element content
$page->find("p")->empty();

# Remove element from the DOM
$page->find("p")->remove();
```

> Element content
* text()
* html()

```php
# Getter
$text = $page->find("p")->text();
$html = $page->find("p")->html();

# Setter
$newText = $page->find("p")->text("New text content");
$newHtml = $page->find("p")->html("

New html content

");
```

> Element attribute
* attr()
* addClass()
* removeClass()
* hasClass()
* addId()
* removeId()
* hasId()

```php
# Getter
$attributes = $page->find("a")->attr();

$link = $page->find("a")->attr("href");

# Setter
$page->find("a")->attr("data-id", rand());

# Class
$page->find("p")->addClass("spider");
$page->find("p")->removeClass("spider");
$page->find("p")->hasClass("spider");

# ID
$page->find("p")->addID("spider");
$page->find("p")->removeID("spider");
$page->find("p")->hasID("spider");
```

> Export current page content
```php
$filename = "app";

$path = __DIR__ . "\\html\\" . $filename . rand() . ".html";

$page->export($path);
```