Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lavoiesl/php-html5lib
A clone of https://code.google.com/p/html5lib/
https://github.com/lavoiesl/php-html5lib
Last synced: 2 months ago
JSON representation
A clone of https://code.google.com/p/html5lib/
- Host: GitHub
- URL: https://github.com/lavoiesl/php-html5lib
- Owner: lavoiesl
- Created: 2012-07-10T01:28:37.000Z (over 12 years ago)
- Default Branch: master
- Last Pushed: 2012-07-10T02:53:16.000Z (over 12 years ago)
- Last Synced: 2023-03-13T16:11:55.294Z (almost 2 years ago)
- Language: PHP
- Size: 172 KB
- Stars: 6
- Watchers: 2
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# HTML5Lib - PHP flavour
This is an implementation of the tokenization and tree-building parts
of the HTML5 specification in PHP. Potential uses of this library
can be found in web-scrapers and HTML filters.Warning: This is a pre-alpha release, and as such, certain parts of
this code are not up-to-snuff (e.g. error reporting and performance).
However, the code is very close to spec and passes 100% of tests
not related to parse errors. Nevertheless, expect to have to update
your code on the next upgrade.## Usage notes
```php
...');
$nodelist = Parser::parseFragment('Boo
');
$nodelist = Parser::parseFragment('Bar', 'table');
?>
```## Documentation
```
Parser::parse($text)
$text : HTML to parse
return : DOMDocument of parsed documentParser::parseFragment($text, $context)
$text : HTML to parse
$context : String name of context element
return : DOMDocument of parsed document
```## Developer notes
* To setup unit tests, you need to add a small stub file test-settings.php
that contains $simpletest_location = 'path/to/simpletest/'; This needs to
be version 1.1 (or, until that is released, SVN trunk) of SimpleTest.* We don't want to ultimately use PHP's DOM because it is not tolerant
of certain types of errors that HTML 5 allows (for example, an element
"foo@bar"). But the current implementation uses it, since it's easy.
Eventually, this html5lib implementation will get a version of SimpleTree;
and may possibly start using that by default.* The original implementation of this performed line and column tracking
in place. However, it was found that this approximately doubled the
runtime of tokenization, so we decided to take a more optimistic approach:
only calculate line/column numbers when explicitly asked to. This
is slower if we attempt to calculate line/column numbers for everything
in the document, but if there is a small enough number of errors it
is a great improvement.