https://github.com/jerodev/diglett
Simple and extendable web scraper using css selectors
https://github.com/jerodev/diglett
php webscraper
Last synced: about 1 month ago
JSON representation
Simple and extendable web scraper using css selectors
- Host: GitHub
- URL: https://github.com/jerodev/diglett
- Owner: jerodev
- License: mit
- Created: 2018-10-02T18:45:16.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2020-04-27T14:02:00.000Z (almost 6 years ago)
- Last Synced: 2025-06-03T18:29:50.532Z (10 months ago)
- Topics: php, webscraper
- Language: PHP
- Homepage:
- Size: 76.2 KB
- Stars: 3
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Diglett Web Scraper
[](https://travis-ci.org/jerodev/diglett) [](https://scrutinizer-ci.com/g/jerodev/diglett/?branch=master) [](https://github.styleci.io/repos/151305583)
Diglett is an extended web crawler based on the [Symfony DomCrawler Component](https://symfony.com/doc/current/components/dom_crawler.html). It allows to use extended and custom css selectors to easily get data from a web page.
## Requirements
- PHP 7.1.18 or higher
## How to use
Diglett includes a webclient that returns a Diglett instance, but you can also inject your own Symfony Crawler object into the Diglett class. From your Diglett object, you can start using the different functions that implement the specialized css filter functions.
```php
$diglett = \Jerodev\Diglett\WebClient::get('https://www.tabletopfinder.eu/');
$firstParagraph = $diglett->getText("p:first()");
```
## Built-in selector functions
| Function | Description | Example |
| --------- | ----------- | ------- |
| **:containsregex(str)** | Get the elements where the text content matches a regular expression | `div p:containsregex([Hh]el+o)` |
| **:containstext(str)** | Get the elements where the text content contain this substring | `div p:containstext(Hello World)` |
| **:first()** | Get the first element in a collection | `ul li:first()` |
| **:last()** | Get the last element in a collection | `ul li:last()` |
| **:next()** | Get the first sibling to the current element if available | `ul.test:next() li` |
| **:nth(x)** | Get the nth element in a collection (starting at 1) | `ul li:nth(3)` |
| **:prev()** | Get previous sibling to the current element if available | `ul li:last():prev()` |
| **:text(str)** | Get elements that exactly have this innerText | `ul li:text(Hello World)` |