{"id":13759611,"url":"https://github.com/humanmade/clean-html","last_synced_at":"2025-06-30T23:10:07.458Z","repository":{"id":32412468,"uuid":"35989380","full_name":"humanmade/clean-html","owner":"humanmade","description":null,"archived":false,"fork":false,"pushed_at":"2020-06-09T13:19:05.000Z","size":11,"stargazers_count":46,"open_issues_count":2,"forks_count":1,"subscribers_count":28,"default_branch":"master","last_synced_at":"2025-06-14T08:11:58.424Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"PHP","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/humanmade.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-05-21T03:43:36.000Z","updated_at":"2024-10-15T20:20:32.000Z","dependencies_parsed_at":"2022-08-23T03:30:53.532Z","dependency_job_id":null,"html_url":"https://github.com/humanmade/clean-html","commit_stats":null,"previous_names":["humanmade/whitelist-html"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/humanmade/clean-html","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/humanmade%2Fclean-html","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/humanmade%2Fclean-html/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/humanmade%2Fclean-html/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/humanmade%2Fclean-html/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/humanmade","download_url":"https://codeload.github.com/humanmade/clean-html/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/humanmade%2Fclean-html/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260439067,"owners_count":23009269,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T13:00:56.273Z","updated_at":"2025-06-30T23:10:07.431Z","avatar_url":"https://github.com/humanmade.png","language":"PHP","funding_links":[],"categories":["PHP"],"sub_categories":[],"readme":"# clean_html: Safe HTML for WordPress\n\nIntroduces an `esc_*()`-like function for when you need to allow *some* HTML.\n\n## Rationale\n### Background\n\nBest practices when working with any sort of data is to escape your output, and\ndo it as late as possible. Values can't usually know about where they're going\nto be used, so you need to escape based on whatever context you're\noutputting into. These are things like `esc_attr()` for HTML attribute values,\n`esc_html()` for text in HTML, and so on.\n\nEven if values could know about their output context, it's still possible for\nusers to craft malicious output if you're not escaping properly. For this\nreason, you need to do sanitization on input (to ensure your value is correct),\nas well as escaping on output (to ensure the value is output into the context\ncorrectly).\n\nRight now across every WordPress site, there's a glaring hole in escaping, and\nhence in security.\n\nWhen translating strings in WordPress, the most common functions to use are\n`__()` (translate and return) or `_e()` (translate and output). Where possible,\nthese need to be escaped too, to ensure that translations don't accidentally\nbreak your output. For this reason, `esc_html_e()`, `esc_attr_e()`, etc are\noffered as convenience functions.\n\nHowever, this falls down when you need to have HTML in the translation.\nTranslation best practices say to include as much information as possible for\ntranslators when you translate a string. This means including HTML tags in the\nstring so translators can understand how the sentence is formed.\n\nIt's possible to do \"clever\" hacks to get around this with placeholders,\nfor example:\n\n```php\n$text = sprintf(\n\tesc_html__( 'This is some text %1$swith a link%2$s'),\n\t'\u003ca href=\"http://example.com/\"\u003e',\n\t'\u003c/a\u003e'\n);\n```\n\nNote though that this is much harder for translators to understand, since they\ncan't intuitively tell what's going on without checking the code. Even with\ntranslator comments, it's still harder to understand. There's also no guarantee\nthat this is secure. You could swap the placeholders, or leave out pieces. Best\npractice states that we should instead have the following:\n\n```php\n$text = sprintf(\n\tesc_html__( 'This is some text \u003ca href=\"%1$s\"\u003ewith a link\u003c/a\u003e'),\n\t'http://example.com/'\n);\n```\n\nRight now, the policy is essentially to treat translated strings with HTML as\ntrusted. Not only does this push the burden off to translation validators in\nGlotPress, but it means you're no longer in control of your output. This is an\nattack vector waiting to be exploited.\n\n\n### How do we solve this?\n\nWordPress contains functions specifically designed to help with this problem.\nAfter all, people can submit comments or posts with HTML in them, but WP can\nhandle this fine. WordPress handles this through a library called kses, which\nsanitizes HTML down to a small, safe subset of HTML. Posts can have more\nHTML tags than comments can, since they're usually semi-trusted users.\n\nkses is great, but is not typically used outside of large HTML blocks like post\nor comment content. The reason for this is often stated as performance. It's\nwell-known that kses is pretty slow, since it has to essentially disassemble the\nHTML, then reconstruct it with the allowed tags.\n\nHowever, Zack Tollman wrote a [fantastic post][tollmanz-kses] that calls into\nquestion this accepted knowledge of kses performance. Zack's findings show that\nwhile kses is worse with performance on longer pieces of content (like post\ncontent), it's actually closer to being on-par with other escaping for short\nstrings. This is even more evident when reducing the allowed elements down\nfrom the default to just the elements you need.\n\n[tollmanz-kses]: https://www.tollmanz.com/wp-kses-performance/\n\n### `clean_html`\n\nThis library provides a nice, easy, performant way to perform sanitization on\ntranslated strings. Rather than requiring you to work with the internals of\nkses, it's much closer to functions like `esc_html`.\n\nSecurity is only useful if it's also usable. For the most part, `clean_html`\ncan be used in exactly the same way developers are used to using other escaping\nfunctions.\n\nA quick example to demonstrate how easy it is:\n```html\n\u003c!-- Previously --\u003e\n\u003cp\u003e\u003c?php _e( 'This is a terrific use of \u003ccode\u003eWP_Error\u003c/code\u003e.' ) ?\u003e\u003c/p\u003e\n\n\u003c!-- Secure version --\u003e\n\u003cp\u003e\u003c?php print_clean_html( __( 'This is a terrific use of \u003ccode\u003eWP_Error\u003c/code\u003e.' ), 'code' ) ?\u003e\u003c/p\u003e\n```\n\nEven if a malicious translator changed this to include a link to a spam site (or\nworse), this would be caught and stripped by `clean_html`.\n\nTaking our original example from above, we can modify it to only allow `a` tags:\n\n```php\n$text = clean_html(\n\tsprintf(\n\t\t__( 'This is some text \u003ca href=\"%1$s\"\u003ewith a link\u003c/a\u003e'),\n\t\t'http://example.com/'\n\t),\n\t'a'\n);\n```\n\nIt's that easy. You can do this with multiple elements as well, using a\ncomma-separated string or list of elements:\n\n```php\n$text = clean_html(\n\tsprintf(\n\t\t__( 'This is \u003ccode\u003esome\u003c/code\u003e text \u003ca href=\"%1$s\"\u003ewith a link\u003c/a\u003e'),\n\t\t'http://example.com/'\n\t),\n\t'a, code' // or array( 'a', 'code' )\n);\n```\n\nIf you need custom attributes, you can use kses-style attribute specifiers.\nThese can be mixed too:\n\n```php\n$text = clean_html(\n\tsprintf(\n\t\t__( 'This is \u003cspan class=\"x\"\u003esome\u003c/span\u003e text \u003ca href=\"%1$s\"\u003ewith a link\u003c/a\u003e'),\n\t\t'http://example.com/'\n\t),\n\tarray(\n\t\t'a',\n\t\t'span' =\u003e array(\n\t\t\t'class' =\u003e true,\n\t\t),\n\t)\n);\n```\n\n\n### Performance Test\n\nIn a quick test, the string\n`'hello with a \u003ca href=\"wak://example.com\"\u003emalicious extra link!\u003c///q\u003e\u003co\u003eb'` was\nrun through both `clean_html` (with only `a`) and `esc_html` with 10,000\niterations. While the two functions don't perform the same task, they're both\nescaping functions, so it's useful to compare performance to understand whether\nthis approach can be used in production code.\n\nIn an unscientific trial, this gave figures of 0.96s for `clean_html` and\n1.07s for `esc_html` for 10,000 trials each. This indicates that\n`clean_html` is at least on the order of other escaping functions.\n\n\n## Using this Library\n\nTwo steps to using this library:\n\n1. Add this library in as a git submodule.\n2. Load `clean-html.php` before you need to use it. We recommend in\n   `mu-plugins`, but you can also load it in via `wp-config.php` if you want it\n   earlier.\n\nDone. Start using the function.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhumanmade%2Fclean-html","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhumanmade%2Fclean-html","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhumanmade%2Fclean-html/lists"}