{"id":48830962,"url":"https://github.com/webfactory/html5-tagrewriter","last_synced_at":"2026-04-14T20:31:40.083Z","repository":{"id":333938978,"uuid":"1139368959","full_name":"webfactory/html5-tagrewriter","owner":"webfactory","description":"A small library that uses a handler pattern to transform HTML documents, based on the PHP 8.4+ HTML5 parser and DOM extension","archived":false,"fork":false,"pushed_at":"2026-01-30T21:20:28.000Z","size":37,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-03T15:57:17.343Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/webfactory.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-01-21T21:46:47.000Z","updated_at":"2026-01-30T21:20:32.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/webfactory/html5-tagrewriter","commit_stats":null,"previous_names":["webfactory/webfactory-html5-tagrewriter"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/webfactory/html5-tagrewriter","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/webfactory%2Fhtml5-tagrewriter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/webfactory%2Fhtml5-tagrewriter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/webfactory%2Fhtml5-tagrewriter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/webfactory%2Fhtml5-tagrewriter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/webfactory","download_url":"https://codeload.github.com/webfactory/html5-tagrewriter/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/webfactory%2Fhtml5-tagrewriter/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31815062,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-14T18:05:02.291Z","status":"ssl_error","status_checked_at":"2026-04-14T18:05:01.765Z","response_time":153,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-04-14T20:31:39.526Z","updated_at":"2026-04-14T20:31:40.075Z","avatar_url":"https://github.com/webfactory.png","language":"PHP","funding_links":[],"categories":[],"sub_categories":[],"readme":"# webfactory HTML5 TagRewriter library\n\nA small library that uses a handler pattern to transform HTML documents. Based on the PHP 8.4+ HTML5 parser and DOM extension. \n\nUseful to make manipulations to HTML5 documents that may not be so easy when generating the HTML output (e.g. a template engine\nlike Twig), but are rather trivial when looking at the final DOM.\n\nExamples:\n- Add `target=\"_blank\"` and `rel=\"noopener\"` to all external links\n- Find all `\u003cimg\u003e` in a page that have a `data-credits` attribute, and place all credits information in a section in the page footer\n- Find all headings within the `\u003cmain\u003e` section of the page, generate a table of contents with anchor links and place it at the beginning of the page\n\nFor the Symfony integration, see https://github.com/webfactory/WebfactoryHtml5TagRewriterBundle.\n\n## Usage\n\n### Basic Usage\n\n```php\nuse Webfactory\\Html5TagRewriter\\Implementation\\Html5TagRewriter;\n\n$rewriter = new Html5TagRewriter();\n\n// Process a complete HTML5 document\n$html = '\u003c!DOCTYPE html\u003e\u003chtml\u003e\u003cbody\u003e\u003cp\u003eHello\u003c/p\u003e\u003c/body\u003e\u003c/html\u003e';\n$result = $rewriter-\u003eprocess($html);\n\n// Process an HTML fragment\n$fragment = '\u003cp\u003eHello \u003cstrong\u003eWorld\u003c/strong\u003e\u003c/p\u003e';\n$result = $rewriter-\u003eprocessBodyFragment($fragment);\n```\n\n\u003e [!NOTE]\n\u003e The `processBodyFragment()` method is currently limited in that it can only process\n\u003e HTML strings that come from within the `\u003cbody\u003e` section. This has to do with the \n\u003e HTML 5 parsing rules defining different [parsing states](https://html.spec.whatwg.org/multipage/parsing.html#parse-state),\n\u003e and the PHP DOM API for the HTML 5 parser does currently not expose\n\u003e a (documented) way to create fragments and passing the required context information.\n\u003e For correct results, you should limit its usage to fragments that shall be processed \n\u003e starting in the `in body` parsing state and where the `data state` [tokenization mode](https://html.spec.whatwg.org/multipage/parsing.html#tokenization)\n\u003e is active.\n\n### Creating a Handler\n\nImplement the `RewriteHandler` interface or extend `BaseRewriteHandler` to create custom tag transformations.\nThe `BaseRewriteHandler` provides empty default implementations, so you only need to override the methods you need:\n\n```php\nuse Dom\\Element;\nuse Webfactory\\Html5TagRewriter\\Handler\\BaseRewriteHandler;\n\nclass ExternalLinkHandler extends BaseRewriteHandler\n{\n    public function appliesTo(): string\n    {\n        // XPath expression to match elements\n        // Use 'html:' prefix for HTML5 elements, 'svg:' for SVG and 'mathml:' for MathML\n        return '//html:a[@href]';\n    }\n\n    public function match(Element $element): void\n    {\n        $href = $element-\u003egetAttribute('href');\n        if (str_starts_with($href, 'http')) {\n            $element-\u003esetAttribute('target', '_blank');\n            $element-\u003esetAttribute('rel', 'noopener');\n        }\n    }\n}\n```\n\n### Registering Handlers\n\n```php\n$rewriter = new Html5TagRewriter();\n$rewriter-\u003eregister(new ExternalLinkHandler());\n$rewriter-\u003eregister(new AnotherHandler());\n\n$result = $rewriter-\u003eprocess($html);\n```\n\n### XPath Namespaces\n\nThe following namespaces are pre-registered for XPath queries:\n\n| Prefix   | Namespace URI                        |\n|----------|--------------------------------------|\n| `html`   | `http://www.w3.org/1999/xhtml`       |\n| `svg`    | `http://www.w3.org/2000/svg`         |\n| `mathml` | `http://www.w3.org/1998/Math/MathML` |\n\n### ESI Tag Support\n\nThe library preserves Edge Side Includes (ESI) tags verbatim during HTML5 processing. ESI tags present multiple challenges:\n\n1. **Self-closing syntax**: Tags like `\u003cesi:include src=\"...\" /\u003e` don't exist in HTML5\n2. **Arbitrary interleaving**: ESI tags can span across HTML element boundaries\n3. **Attribute encoding**: Characters like `\u0026` must not become `\u0026amp;`\n\nThe [ESI Language Specification 1.0](https://www.w3.org/TR/esi-lang/) describes ESI as \"XML-based\" (Section 1), but also states that documents containing ESI markup are not valid. From Section 1.1:\n\n\u003e the markup that is emitted by the origin server is not valid; it contains interposed elements from the ESI namespace\n\nESI elements can be arbitrarily interleaved with the underlying content, which does not even need to be HTML. The standard makes no statements about whether HTML entities must be applied. Since XML parsing is not feasible for such documents, assuming XML encoding rules is not warranted.\n\nThis library wraps every ESI tag (opening, closing, or self-closing) in an HTML comment using the ESI comment syntax defined in Section 3.7 of the ESI specification (`\u003c!--esi ... --\u003e`). This hides the tags from the HTML5 parser while preserving them verbatim.\n\n\u003e [!IMPORTANT]\n\u003e During processing, ESI tags appear as Comment nodes in the DOM. If RewriteHandler\n\u003e transformations move or delete these comment nodes, the final result may not\n\u003e match expectations.\n\n## Credits, Copyright and License\n\nThis library is based on internal work that we have been using at webfactory GmbH, Bonn, at least\nsince 2012. However, that (old) code was written with the legacy PHP DOM extension, leading to \nseveral quirks in HTML processing and requiring the use of [Polyglot HTML 5](https://www.w3.org/TR/html-polyglot/)\nwhich is processable as XML.\n\n- \u003chttps://www.webfactory.de\u003e\n\nCopyright 2026 webfactory GmbH, Bonn. Code released under [the MIT license](LICENSE).   \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwebfactory%2Fhtml5-tagrewriter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwebfactory%2Fhtml5-tagrewriter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwebfactory%2Fhtml5-tagrewriter/lists"}