https://github.com/manychois/simdom
A simple-to-use PHP library for processing DOM documents.
https://github.com/manychois/simdom
dom html html5 parser php
Last synced: 10 months ago
JSON representation
A simple-to-use PHP library for processing DOM documents.
- Host: GitHub
- URL: https://github.com/manychois/simdom
- Owner: manychois
- License: mit
- Created: 2022-12-31T12:34:08.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2025-02-24T18:17:05.000Z (over 1 year ago)
- Last Synced: 2025-04-23T19:15:02.390Z (about 1 year ago)
- Topics: dom, html, html5, parser, php
- Language: PHP
- Homepage:
- Size: 478 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# Simdom - A simplified and relaxed DOM structure library
[](https://www.php.net/releases/8.4/en.php)
[](LICENSE)
Simdom is a lightweight PHP library designed to make parsing and manipulating DOM documents as straightforward as possible. It requires no external dependencies or extensions.
Without using the built-in PHP DOM extension, Simdom can have its own opinionated appraoch on how HTML documents should be parsed and manipulated. It lets you to work with "non-compliant" HTML structure in a literal and intuitive way.
Before outputing the HTML string of the document, you can call the `$document->validate()` method to ensure that the document is valid according to the HTML5 specification.
## Key differences from the standard HTML5 DOM specification / PHP DOM extension
### Simplified node types
Simdom provides 6 node types that form the DOM tree:
- `Document` - The root document node
- `Doctype` - Document type declarations
- `Element` - HTML elements with attributes and child nodes
- `Text` - Text content within elements
- `Comment` - HTML comments
- `Fragment` - Document fragments for grouping nodes
Attributes are not considered as a node type in Simdom, but rather as properties of `Element` nodes.
CDATA section or processing instructions are not supported, as they would not be valid in HTML5 documents.
## Simplified and relaxed DOM structure
- There is no concept of an owner document, meaning nodes can be freely moved between documents.
- There is no concept of namespace.
- `Document`, `Element` and `Fragment` nodes can have child nodes of any type except `Document` and `Fragment`, in any order, i.e.:
- `Document` can hold `Text` child.
- `Document` does not restrict at most one `Doctype` child, and it does not have to be placed before any `Element` child.
- `Document` does not restrict at most one `Element` child.
- `Element` and `Fragment` can hold `Doctype` child.
- There is no concept of valid element structure, meaning elements can be nested in any way, even if it would not be valid HTML5, i.e. `
- Misaligned end tags are fixed by finding the last matching start tag, i.e. `
- `` elements are treated as a Rawtext type element like `` or `<style>`.
- Self-closing tag syntax is supported, for example `<div />` is parsed as `<div></div>`.
- All element names and attributes names are parsed as their ASCII-lowercase form.
## Restrictions
However, there are still some lines you cannot cross in Simdom:
- `Document` and `Fragment` has no parent node, and cannot be a child of any other node. (Inserting `Fragment` as a child of any parent node is fine though, as it means inserting the `Fragment`'s child nodes.)
- `Element` name and attribute names must conform to the HTML5 specification.
- `Doctype` name, public identifier and system identifier must conform to the HTML5 specification.
- `Doctype` name must be present if either public or system identifier is present.
- No control characters are allowed anywhere, e.g. you cannot inject an delete character (U+007F) to a `Text` node.
- `Comment` cannot contain the character sequence `-->`.
- If a `Text` node is under a Rawtext type (e.g. `<script>`) or Rcdata type (e.g. `<textarea>`) element, it cannot contain the character sequence which may terminate the corresponding element start tag, e.g. `parseDocument('
Hello, world!
');// $doc is an instance of \Manychois\Simdom\Document
```
### Node Manipulation
```php
use Manychois\Simdom\Document;
use Manychois\Simdom\Element;
// Create documents
$doc = Document::create();
// Create elements
$div = Element::create('div');
$div->setAttr('class', 'container');
$div->id = 'main-content';
```
### Traversing and Manipulating the DOM Tree
```php
// Access document parts
$html = $doc->documentElement; // The element
$head = $doc->head; // The element
$body = $doc->body; // The element
// Navigate the tree
$element = $body->firstElementChild;
$nextElement = $element->nextElementSibling;
$parent = $element->parent;
// Child node access
foreach ($body->childNodes as $node) {
echo get_class($node) . "\n";
}
// Element-only access
foreach ($body->children as $element) {
echo $element->name . "\n";
}
```
### Adding and Removing Nodes
```php
// Append nodes
$body->append($div, $text);
$body->appendChild($comment);
// Prepend nodes
$body->prepend(Text::create('First text'));
// Insert before/after
$div->before(Comment::create('Before div'));
$div->after(Text::create('After div'));
// Replace nodes
$div->replaceWith(Element::create('section'));
// Remove nodes
$div->remove();
```
### Working with Attributes
```php
$element = Element::create('input');
// Set attributes
$element->setAttr('type', 'text');
// Get attributes
$type = $element->getAttr('type'); // 'text'
$missing = $element->getAttr('missing'); // null
// Check existence
$hasType = $element->hasAttr('type'); // true
// Remove attributes
$element->removeAttr('name');
// Get all attributes
$attrs = $element->attrs(); // ['type' => 'text']
```
### Searching and Traversal
```php
// Depth-first search
$found = $doc->dfs(fn($node) => $node instanceof Element && $node->id === 'target');
// Breadth-first search
$found = $doc->bfs(fn($node) => $node instanceof Element && $node->name === 'button');
// Find the first form
$form = $doc->querySelector('form');
// Iterate through all descendants
foreach ($doc->descendants() as $node) {
if ($node instanceof Text) {
echo $node->data . "\n";
}
}
```
### HTML Serialization
```php
// Convert to string representation
$html = (string) $doc;
// or using the __toString() method
$html = $element->__toString();
```