https://github.com/arefshojaei/spider
PHP web spider
https://github.com/arefshojaei/spider
bot crawler crawling php php-library php-tools php8 scraper scrapping spider web web-bot
Last synced: 10 months ago
JSON representation
PHP web spider
- Host: GitHub
- URL: https://github.com/arefshojaei/spider
- Owner: ArefShojaei
- Created: 2025-02-13T17:40:00.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2025-03-26T20:36:05.000Z (11 months ago)
- Last Synced: 2025-03-26T21:33:35.948Z (11 months ago)
- Topics: bot, crawler, crawling, php, php-library, php-tools, php8, scraper, scrapping, spider, web, web-bot
- Language: PHP
- Homepage:
- Size: 117 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
PHP web spider
```php
loadHTML("http://google.com");
echo $page->find("title")->text() . PHP_EOL;
$page->findAll("a")->each(function($key, $link) {
echo "[LINK] " . $link->attr("href") . PHP_EOL;
});
```
## **Installation**
#### Using Composer
```bash
composer create-project arefshojaei/spider
```
#### Using GIT
```bash
git clone https://github.com/ArefShojaei/Spider
```
> Find element
* find()
* findAll()
```php
$page->find("a");
$page->findAll(".product");
```
> Iterate for each eleemnt
* each()
* map()
* filter()
```php
$page->findAll("a")->each(function($key, $anchor) {
echo "[LINK] " . $anchor->attr("href") . PHP_EOL;
echo "[TITLE] " . $anchor->text() . PHP_EOL;
echo "[HTML] " . $anchor->html() . PHP_EOL;
});
# ----------------------------------------
$anchors = $page->findAll("a")->map(function($key, $anchor) {
$anchor->attr("data-id", rand());
return $anchor;
});
var_dump($anchors);
# ----------------------------------------
$filteredAnchors = $page->findAll("a")->filter(function($key, $anchor) => $anchor->attr("data-id"));
var_dump($filteredAnchors);
```
> Element traversing
* parent()
* after()
* before()
* append()
* prepend()
```php
$parentNode = $page->find(".product")->parent();
# Add parent Element
$page->find(".product")->after("
After Element
");
$page->find(".product")->before("Before Element
");
# Add child (local) element
$page->find(".product")->append("
Append Element
");
$page->find(".product")->prepend("Prepend Element
");
```
> Element cleaner
* empty()
* remove()
```php
# Clean element content
$page->find("p")->empty();
# Remove element from the DOM
$page->find("p")->remove();
```
> Element content
* text()
* html()
```php
# Getter
$text = $page->find("p")->text();
$html = $page->find("p")->html();
# Setter
$newText = $page->find("p")->text("New text content");
$newHtml = $page->find("p")->html("
New html content
");
```
> Element attribute
* attr()
* addClass()
* removeClass()
* hasClass()
* addId()
* removeId()
* hasId()
```php
# Getter
$attributes = $page->find("a")->attr();
$link = $page->find("a")->attr("href");
# Setter
$page->find("a")->attr("data-id", rand());
# Class
$page->find("p")->addClass("spider");
$page->find("p")->removeClass("spider");
$page->find("p")->hasClass("spider");
# ID
$page->find("p")->addID("spider");
$page->find("p")->removeID("spider");
$page->find("p")->hasID("spider");
```
> Export current page content
```php
$filename = "app";
$path = __DIR__ . "\\html\\" . $filename . rand() . ".html";
$page->export($path);
```