https://github.com/berliozframework/htmlselector
PHP library to do queries on HTML files (converted in SimpleXMLElement object) like jQuery on DOM.
https://github.com/berliozframework/htmlselector
composer css-pseudo html php php-library
Last synced: about 1 year ago
JSON representation
PHP library to do queries on HTML files (converted in SimpleXMLElement object) like jQuery on DOM.
- Host: GitHub
- URL: https://github.com/berliozframework/htmlselector
- Owner: BerliozFramework
- License: mit
- Created: 2017-11-06T23:58:36.000Z (over 8 years ago)
- Default Branch: 2.x
- Last Pushed: 2024-10-18T11:53:28.000Z (over 1 year ago)
- Last Synced: 2025-06-16T23:01:46.057Z (about 1 year ago)
- Topics: composer, css-pseudo, html, php, php-library
- Language: PHP
- Homepage:
- Size: 146 KB
- Stars: 7
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# Berlioz HTML Selector
[](https://github.com/BerliozFramework/HtmlSelector/releases)
[](https://github.com/BerliozFramework/HtmlSelector/blob/2.x/LICENSE)
[](https://github.com/BerliozFramework/HtmlSelector/actions/workflows/tests.yml?query=branch%3A2.x)
[](https://www.codacy.com/manual/BerliozFramework/HtmlSelector)
[](https://packagist.org/packages/berlioz/html-selector)
**Berlioz HTML Selector** is a PHP library to do queries on HTML files with CSS selectors like *jQuery* on DOM.
## Installation
### Composer
You can install **Berlioz HTML Selector** with [Composer](https://getcomposer.org/), it's the recommended installation.
```bash
$ composer require berlioz/html-selector
```
### Dependencies
- **PHP** ^8.0
- PHP libraries:
- **dom**
- **libxml**
- **mbstring**
- **simplexml**
## Usage
### Load HTML
You can easily load an HTML string or file with the static function `HtmlSelector::query()`. For files, use second
parameter `contentsIsFile` of method.
```php
$htmlSelector = new \Berlioz\HtmlSelector\HtmlSelector();
$query = $htmlSelector->query('...');
$query = $htmlSelector->query('path-of-my-file/file.html', contentsIsFile: true);
$query = $htmlSelector->query(new SimpleXMLElement(/*...*/));
```
### Load from `ResponseInterface`
`HtmlSelector::queryFromResponse()` permit loading html of a response body.
```php
$htmlSelector = new \Berlioz\HtmlSelector\HtmlSelector();
/** @var \Psr\Http\Message\ResponseInterface $response */
$query = $htmlSelector->queryFromResponse($response);
```
### Do a query
It's very simple to query an HTML string with a selector like *jQuery*.
```php
/** @var \Berlioz\HtmlSelector\Query\Query $query */
$query = $query->find('body > .wrapper h2');
$query = $query->filter(':first');
```
## Selectors
### CSS Simple selectors
- **type**: selection of elements with their type.
- **#id**: selection of an element with it's ID.
- **.class**: selection of elements with their class.
- Attributes selections.
- **[attribute]**: with attribute 'attribute'.
- **[attribute=foo]**: value of attribute equals to 'foo'.
- **[attribute^=foo]**: value of attribute starts with 'foo'.
- **[attribute$=foo]**: value of attribute ends with 'foo'.
- **[attribute*=foo]**: value of attribute contains 'foo'.
- **[attribute!=foo]**: value of attribute different of 'foo'.
- **[attribute~=foo]**: value of attribute contains word 'foo'.
- **[attribute|=foo]**: value of attribute contains prefix 'foo'.
### CSS Ascendants, descendants, multiples
- ***selector* *selector*** or ***selector* >> *selector***: all descendant selector.
- ***selector* > *selector***: direct descendant selector (only children).
- ***selector* ~ *selector***: siblings selector.
- ***selector*, *selector***: multiple selectors.
### CSS Pseudo Classes
### Additional CSS Pseudo Classes (not in CSS specifications) from jQuery library
- **:button**: only elements of type `` without attribute value `[type=submit]` or ``.
- **:checkbox**: only elements with attribute `[type=checkbox]`.
- **:contains(x)**: only elements who contain text given.
- **:eq(x)**: only result with index given (index start to 0).
- **:even**: only even results in selection.
- **:file**: only elements with attribute `[type=file]`.
- **:gt(x)**: only result with an index greater than index given (index start to 0).
- **:gte**: only result with an index greater than or equal to index given (index start to 0).
- **:header**: only elements of heading, like `
`, ``...
- **:image**: only elements with attribute `[type=image]`.
- **:input**: only elements of type ``, ``, `` or ``.
- **:last**: only last result of complete selection.
- **:lt**: only result with index leather than index given (index start to 0).
- **:lte**: only result with index leather than or equal to index given (index start to 0).
- **:odd**: only odd results in selection.
- **:parent**: only elements with one child or more.
- **:password**: only elements with attribute `[type=password]`.
- **:radio**: only elements with attribute `[type=radio]`.
- **:reset**: only elements with attribute `[type=reset]`.
- **:selected**: only elements of type `` with attribute `[selected]`.
- **:submit**: only elements of type `` or `` with attribute `[type=submit]`.
- **:text**: only elements of type `` with attribute `[type=text]` or without `[type]` attribute.
### Additional CSS Pseudo Classes (not in CSS specifications)
- **:count(x)**: only elements who are x children in the parent, used in **:has(selector)** pseudo class.
### Full example of selectors
```
select > option:selected
div#myId.class1.class2[name1=value1][name2=value2]:even:first
```
## Functions
### Default functions
Some default functions are available in Query object to interact with results. The functions should have the same result
as their counterparts on jQuery.
- **attr(name)**: get attribute value
- **attr(name, value)**: set attribute value
- **children()**: get children of elements in result.
- **count()**: count the number of elements in query result.
- **data(nameOfData)**: get data value (name is with camelCase syntax without the 'data-' prefix).
- **filter(selector)**: filter elements in result.
- **find(selector)**: find selector in elements in result.
- **get(i)**: get DOM element in result.
- **hasClass(class_name)**: know if least one of element in result have given classes.
- **html()**: get html of first element in result.
- **index(selector)**: get the index of given selector in result elements.
- **is(selector)**: know if selector valid the least one element in result.
- **isset(i)**: return boolean to know if an element key exists in result.
- **next(selector)**: get next element after each element in result.
- **nextAll(selector)**: get all next elements after each element in result.
- **not(selector)**: filter elements in result.
- **parent()**: get direct parent of current result of selecting.
- **parents(selector)**: get all parents of current result of selecting.
- **prev(selector)**: get prev element after each element in result.
- **prevAll(selector)**: get all prev elements after each element in result.
- **prop(name)**: get property boolean value of an attribute, used for example for `disabled` attribute.
- **prop(name, value)**: set property boolean value of an attribute, used for example for `disabled` attribute.
- **serialize()**: serialize input values of a form. Return a string.
- **serializeArray()**: serialize input values of a form. Return an array.
- **text()**: get text of each element concatenated.
- **val()**: get value of a form element.