{"id":13413982,"url":"https://github.com/microcosm-cc/bluemonday","last_synced_at":"2025-05-12T05:28:45.267Z","repository":{"id":11994023,"uuid":"14570749","full_name":"microcosm-cc/bluemonday","owner":"microcosm-cc","description":"bluemonday: a fast golang HTML sanitizer (inspired by the OWASP Java HTML Sanitizer) to scrub user generated content of XSS","archived":false,"fork":false,"pushed_at":"2025-04-04T09:55:48.000Z","size":643,"stargazers_count":3375,"open_issues_count":24,"forks_count":183,"subscribers_count":37,"default_branch":"main","last_synced_at":"2025-05-12T02:43:42.611Z","etag":null,"topics":["allowlist","go","golang","html","owasp","sanitization","security","xss"],"latest_commit_sha":null,"homepage":"https://github.com/microcosm-cc/bluemonday","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/microcosm-cc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":null},"created_at":"2013-11-20T22:15:49.000Z","updated_at":"2025-05-11T19:26:53.000Z","dependencies_parsed_at":"2023-10-12T17:44:51.269Z","dependency_job_id":"51855eef-cf5a-436f-94fb-e5f4d90eaa4c","html_url":"https://github.com/microcosm-cc/bluemonday","commit_stats":{"total_commits":285,"total_committers":40,"mean_commits":7.125,"dds":0.7052631578947368,"last_synced_commit":"10b8ac69db438c65c6d5469bb3c345aaa81f18d9"},"previous_names":[],"tags_count":28,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microcosm-cc%2Fbluemonday","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microcosm-cc%2Fbluemonday/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microcosm-cc%2Fbluemonday/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microcosm-cc%2Fbluemonday/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/microcosm-cc","download_url":"https://codeload.github.com/microcosm-cc/bluemonday/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253672707,"owners_count":21945481,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["allowlist","go","golang","html","owasp","sanitization","security","xss"],"created_at":"2024-07-30T20:01:54.359Z","updated_at":"2025-05-12T05:28:45.246Z","avatar_url":"https://github.com/microcosm-cc.png","language":"Go","funding_links":[],"categories":["Go","开源类库","Text Processing","\u003ca id=\"683b645c2162a1fce5f24ac2abfa1973\"\u003e\u003c/a\u003e漏洞\u0026\u0026漏洞管理\u0026\u0026漏洞发现/挖掘\u0026\u0026漏洞开发\u0026\u0026漏洞利用\u0026\u0026Fuzzing","文本处理","Open source library","文本處理","Repositories","文本处理`解析和操作文本的代码库`","Specific Formats","\u003cspan id=\"文字处理-text-processing\"\u003e文字处理 Text Processing\u003c/span\u003e","Bot Building","Template Engines","Security"],"sub_categories":["文本处理","Sanitation","\u003ca id=\"5d7191f01544a12bdaf1315c3e986dff\"\u003e\u003c/a\u003eXSS\u0026\u0026XXE","环境卫生","Word Processing","Advanced Console UIs","HTTP Clients","高級控制台界面","高级控制台界面","查询语","交流","Middlewares","\u003cspan id=\"高级控制台用户界面-advanced-console-uis\"\u003e高级控制台用户界面 Advanced Console UIs\u003c/span\u003e","Advanced and Specialized Tools"],"readme":"# bluemonday [![GoDoc](https://godoc.org/github.com/microcosm-cc/bluemonday?status.png)](https://godoc.org/github.com/microcosm-cc/bluemonday) [![Sourcegraph](https://sourcegraph.com/github.com/microcosm-cc/bluemonday/-/badge.svg)](https://sourcegraph.com/github.com/microcosm-cc/bluemonday?badge)\n\nbluemonday is a HTML sanitizer implemented in Go. It is fast and highly configurable.\n\nbluemonday takes untrusted user generated content as an input, and will return HTML that has been sanitised against an allowlist of approved HTML elements and attributes so that you can safely include the content in your web page.\n\nIf you accept user generated content, and your server uses Go, you **need** bluemonday.\n\nThe default policy for user generated content (`bluemonday.UGCPolicy().Sanitize()`) turns this:\n```html\nHello \u003cSTYLE\u003e.XSS{background-image:url(\"javascript:alert('XSS')\");}\u003c/STYLE\u003e\u003cA CLASS=XSS\u003e\u003c/A\u003eWorld\n```\n\nInto a harmless:\n```html\nHello World\n```\n\nAnd it turns this:\n```html\n\u003ca href=\"javascript:alert('XSS1')\" onmouseover=\"alert('XSS2')\"\u003eXSS\u003ca\u003e\n```\n\nInto this:\n```html\nXSS\n```\n\nWhilst still allowing this:\n```html\n\u003ca href=\"http://www.google.com/\"\u003e\n  \u003cimg src=\"https://ssl.gstatic.com/accounts/ui/logo_2x.png\"/\u003e\n\u003c/a\u003e\n```\n\nTo pass through mostly unaltered (it gained a rel=\"nofollow\" which is a good thing for user generated content):\n```html\n\u003ca href=\"http://www.google.com/\" rel=\"nofollow\"\u003e\n  \u003cimg src=\"https://ssl.gstatic.com/accounts/ui/logo_2x.png\"/\u003e\n\u003c/a\u003e\n```\n\nIt protects sites from [XSS](http://en.wikipedia.org/wiki/Cross-site_scripting) attacks. There are many [vectors for an XSS attack](https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet) and the best way to mitigate the risk is to sanitize user input against a known safe list of HTML elements and attributes.\n\nYou should **always** run bluemonday **after** any other processing.\n\nIf you use [blackfriday](https://github.com/russross/blackfriday) or [Pandoc](http://johnmacfarlane.net/pandoc/) then bluemonday should be run after these steps. This ensures that no insecure HTML is introduced later in your process.\n\nbluemonday is heavily inspired by both the [OWASP Java HTML Sanitizer](https://code.google.com/p/owasp-java-html-sanitizer/) and the [HTML Purifier](http://htmlpurifier.org/).\n\n## Technical Summary\n\nAllowlist based, you need to either build a policy describing the HTML elements and attributes to permit (and the `regexp` patterns of attributes), or use one of the supplied policies representing good defaults.\n\nThe policy containing the allowlist is applied using a fast non-validating, forward only, token-based parser implemented in the [Go net/html library](https://godoc.org/golang.org/x/net/html) by the core Go team.\n\nWe expect to be supplied with well-formatted HTML (closing elements for every applicable open element, nested correctly) and so we do not focus on repairing badly nested or incomplete HTML. We focus on simply ensuring that whatever elements do exist are described in the policy allowlist and that attributes and links are safe for use on your web page. [GIGO](http://en.wikipedia.org/wiki/Garbage_in,_garbage_out) does apply and if you feed it bad HTML bluemonday is not tasked with figuring out how to make it good again.\n\n## Is it production ready?\n\n*Yes*\n\nWe are using bluemonday in production having migrated from the widely used and heavily field tested OWASP Java HTML Sanitizer.\n\nWe are passing our extensive test suite (including AntiSamy tests as well as tests for any issues raised). Check for any [unresolved issues](https://github.com/microcosm-cc/bluemonday/issues?page=1\u0026state=open) to see whether anything may be a blocker for you.\n\nWe invite pull requests and issues to help us ensure we are offering comprehensive protection against various attacks via user generated content.\n\n## Usage\n\nInstall using `go get github.com/microcosm-cc/bluemonday`\n\nThen call it:\n```go\npackage main\n\nimport (\n\t\"fmt\"\n\n\t\"github.com/microcosm-cc/bluemonday\"\n)\n\nfunc main() {\n\t// Do this once for each unique policy, and use the policy for the life of the program\n\t// Policy creation/editing is not safe to use in multiple goroutines\n\tp := bluemonday.UGCPolicy()\n\n\t// The policy can then be used to sanitize lots of input and it is safe to use the policy in multiple goroutines\n\thtml := p.Sanitize(\n\t\t`\u003ca onblur=\"alert(secret)\" href=\"http://www.google.com\"\u003eGoogle\u003c/a\u003e`,\n\t)\n\n\t// Output:\n\t// \u003ca href=\"http://www.google.com\" rel=\"nofollow\"\u003eGoogle\u003c/a\u003e\n\tfmt.Println(html)\n}\n```\n\nWe offer three ways to call Sanitize:\n```go\np.Sanitize(string) string\np.SanitizeBytes([]byte) []byte\np.SanitizeReader(io.Reader) bytes.Buffer\n```\n\nIf you are obsessed about performance, `p.SanitizeReader(r).Bytes()` will return a `[]byte` without performing any unnecessary casting of the inputs or outputs. Though the difference is so negligible you should never need to care.\n\nYou can build your own policies:\n```go\npackage main\n\nimport (\n\t\"fmt\"\n\n\t\"github.com/microcosm-cc/bluemonday\"\n)\n\nfunc main() {\n\tp := bluemonday.NewPolicy()\n\n\t// Require URLs to be parseable by net/url.Parse and either:\n\t//   mailto: http:// or https://\n\tp.AllowStandardURLs()\n\n\t// We only allow \u003cp\u003e and \u003ca href=\"\"\u003e\n\tp.AllowAttrs(\"href\").OnElements(\"a\")\n\tp.AllowElements(\"p\")\n\n\thtml := p.Sanitize(\n\t\t`\u003ca onblur=\"alert(secret)\" href=\"http://www.google.com\"\u003eGoogle\u003c/a\u003e`,\n\t)\n\n\t// Output:\n\t// \u003ca href=\"http://www.google.com\"\u003eGoogle\u003c/a\u003e\n\tfmt.Println(html)\n}\n```\n\nWe ship two default policies:\n\n1. `bluemonday.StrictPolicy()` which can be thought of as equivalent to stripping all HTML elements and their attributes as it has nothing on its allowlist. An example usage scenario would be blog post titles where HTML tags are not expected at all and if they are then the elements *and* the content of the elements should be stripped. This is a *very* strict policy.\n2. `bluemonday.UGCPolicy()` which allows a broad selection of HTML elements and attributes that are safe for user generated content. Note that this policy does *not* allow iframes, object, embed, styles, script, etc. An example usage scenario would be blog post bodies where a variety of formatting is expected along with the potential for TABLEs and IMGs.\n\n## Policy Building\n\nThe essence of building a policy is to determine which HTML elements and attributes are considered safe for your scenario. OWASP provide an [XSS prevention cheat sheet](https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet) to help explain the risks, but essentially:\n\n1. Avoid anything other than the standard HTML elements\n1. Avoid `script`, `style`, `iframe`, `object`, `embed`, `base` elements that allow code to be executed by the client or third party content to be included that can execute code\n1. Avoid anything other than plain HTML attributes with values matched to a regexp\n\nBasically, you should be able to describe what HTML is fine for your scenario. If you do not have confidence that you can describe your policy please consider using one of the shipped policies such as `bluemonday.UGCPolicy()`.\n\nTo create a new policy:\n```go\np := bluemonday.NewPolicy()\n```\n\nTo add elements to a policy either add just the elements:\n```go\np.AllowElements(\"b\", \"strong\")\n```\n\nOr using a regex:\n\n_Note: if an element is added by name as shown above, any matching regex will be ignored_\n\nIt is also recommended to ensure multiple patterns don't overlap as order of execution is not guaranteed and can result in some rules being missed.\n```go\np.AllowElementsMatching(regex.MustCompile(`^my-element-`))\n```\n\nOr add elements as a virtue of adding an attribute:\n```go\n// Note the recommended pattern, see the recommendation on using .Matching() below\np.AllowAttrs(\"nowrap\").OnElements(\"td\", \"th\")\n```\n\nAgain, this also supports a regex pattern match alternative:\n```go\np.AllowAttrs(\"nowrap\").OnElementsMatching(regex.MustCompile(`^my-element-`))\n```\n\nAttributes can either be added to all elements:\n```go\np.AllowAttrs(\"dir\").Matching(regexp.MustCompile(\"(?i)rtl|ltr\")).Globally()\n```\n\nOr attributes can be added to specific elements:\n```go\n// Not the recommended pattern, see the recommendation on using .Matching() below\np.AllowAttrs(\"value\").OnElements(\"li\")\n```\n\nIt is **always** recommended that an attribute be made to match a pattern. XSS in HTML attributes is very easy otherwise:\n```go\n// \\p{L} matches unicode letters, \\p{N} matches unicode numbers\np.AllowAttrs(\"title\").Matching(regexp.MustCompile(`[\\p{L}\\p{N}\\s\\-_',:\\[\\]!\\./\\\\\\(\\)\u0026]*`)).Globally()\n```\n\nYou can stop at any time and call .Sanitize():\n```go\n// string htmlIn passed in from a HTTP POST\nhtmlOut := p.Sanitize(htmlIn)\n```\n\nAnd you can take any existing policy and extend it:\n```go\np := bluemonday.UGCPolicy()\np.AllowElements(\"fieldset\", \"select\", \"option\")\n```\n\n### Inline CSS\n\nAlthough it's possible to handle inline CSS using `AllowAttrs` with a `Matching` rule, writing a single monolithic regular expression to safely process all inline CSS which you wish to allow is not a trivial task.  Instead of attempting to do so, you can allow the `style` attribute on whichever element(s) you desire and use style policies to control and sanitize inline styles.\n\nIt is strongly recommended that you use `Matching` (with a suitable regular expression)\n`MatchingEnum`, or `MatchingHandler` to ensure each style matches your needs,\nbut default handlers are supplied for most widely used styles.\n\nSimilar to attributes, you can allow specific CSS properties to be set inline:\n```go\np.AllowAttrs(\"style\").OnElements(\"span\", \"p\")\n// Allow the 'color' property with valid RGB(A) hex values only (on any element allowed a 'style' attribute)\np.AllowStyles(\"color\").Matching(regexp.MustCompile(\"(?i)^#([0-9a-f]{3,4}|[0-9a-f]{6}|[0-9a-f]{8})$\")).Globally()\n```\n\nAdditionally, you can allow a CSS property to be set only to an allowed value:\n```go\np.AllowAttrs(\"style\").OnElements(\"span\", \"p\")\n// Allow the 'text-decoration' property to be set to 'underline', 'line-through' or 'none'\n// on 'span' elements only\np.AllowStyles(\"text-decoration\").MatchingEnum(\"underline\", \"line-through\", \"none\").OnElements(\"span\")\n```\n\nOr you can specify elements based on a regex pattern match:\n```go\np.AllowAttrs(\"style\").OnElementsMatching(regex.MustCompile(`^my-element-`))\n// Allow the 'text-decoration' property to be set to 'underline', 'line-through' or 'none'\n// on 'span' elements only\np.AllowStyles(\"text-decoration\").MatchingEnum(\"underline\", \"line-through\", \"none\").OnElementsMatching(regex.MustCompile(`^my-element-`))\n```\n\nIf you need more specific checking, you can create a handler that takes in a string and returns a bool to\nvalidate the values for a given property. The string parameter has been\nconverted to lowercase and unicode code points have been converted.\n```go\nmyHandler := func(value string) bool{\n\t// Validate your input here\n\treturn true\n}\np.AllowAttrs(\"style\").OnElements(\"span\", \"p\")\n// Allow the 'color' property with values validated by the handler (on any element allowed a 'style' attribute)\np.AllowStyles(\"color\").MatchingHandler(myHandler).Globally()\n```\n\n### Links\n\nLinks are difficult beasts to sanitise safely and also one of the biggest attack vectors for malicious content.\n\nIt is possible to do this:\n```go\np.AllowAttrs(\"href\").Matching(regexp.MustCompile(`(?i)mailto|https?`)).OnElements(\"a\")\n```\n\nBut that will not protect you as the regular expression is insufficient in this case to have prevented a malformed value doing something unexpected.\n\nWe provide some additional global options for safely working with links.\n\n`RequireParseableURLs` will ensure that URLs are parseable by Go's `net/url` package:\n```go\np.RequireParseableURLs(true)\n```\n\nIf you have enabled parseable URLs then the following option will `AllowRelativeURLs`. By default this is disabled (bluemonday is an allowlist tool... you need to explicitly tell us to permit things) and when disabled it will prevent all local and scheme relative URLs (i.e. `href=\"localpage.html\"`, `href=\"../home.html\"` and even `href=\"//www.google.com\"` are relative):\n```go\np.AllowRelativeURLs(true)\n```\n\nIf you have enabled parseable URLs then you can allow the schemes (commonly called protocol when thinking of `http` and `https`) that are permitted. Bear in mind that allowing relative URLs in the above option will allow for a blank scheme:\n```go\np.AllowURLSchemes(\"mailto\", \"http\", \"https\")\n```\n\nRegardless of whether you have enabled parseable URLs, you can force all URLs to have a rel=\"nofollow\" attribute. This will be added if it does not exist, but only when the `href` is valid:\n```go\n// This applies to \"a\" \"area\" \"link\" elements that have a \"href\" attribute\np.RequireNoFollowOnLinks(true)\n```\n\nSimilarly, you can force all URLs to have \"noreferrer\" in their rel attribute.\n```go\n// This applies to \"a\" \"area\" \"link\" elements that have a \"href\" attribute\np.RequireNoReferrerOnLinks(true)\n```\n\n\nWe provide a convenience method that applies all of the above, but you will still need to allow the linkable elements for the URL rules to be applied to:\n```go\np.AllowStandardURLs()\np.AllowAttrs(\"cite\").OnElements(\"blockquote\", \"q\")\np.AllowAttrs(\"href\").OnElements(\"a\", \"area\")\np.AllowAttrs(\"src\").OnElements(\"img\")\n```\n\nAn additional complexity regarding links is the data URI as defined in [RFC2397](http://tools.ietf.org/html/rfc2397). The data URI allows for images to be served inline using this format:\n\n```html\n\u003cimg src=\"data:image/webp;base64,UklGRh4AAABXRUJQVlA4TBEAAAAvAAAAAAfQ//73v/+BiOh/AAA=\"\u003e\n```\n\nWe have provided a helper to verify the mimetype followed by base64 content of data URIs links:\n\n```go\np.AllowDataURIImages()\n```\n\nThat helper will enable GIF, JPEG, PNG and WEBP images.\n\nIt should be noted that there is a potential [security](https://web.archive.org/web/20120427103111/http://palizine.plynt.com/issues/2010Oct/bypass-xss-filters/) [risk](https://capec.mitre.org/data/definitions/244.html) with the use of data URI links. You should only enable data URI links if you already trust the content.\n\nWe also have some features to help deal with user generated content:\n```go\np.AddTargetBlankToFullyQualifiedLinks(true)\n```\n\nThis will ensure that anchor `\u003ca href=\"\" /\u003e` links that are fully qualified (the href destination includes a host name) will get `target=\"_blank\"` added to them.\n\nAdditionally any link that has `target=\"_blank\"` after the policy has been applied will also have the `rel` attribute adjusted to add `noopener`. This means a link may start like `\u003ca href=\"//host/path\"/\u003e` and will end up as `\u003ca href=\"//host/path\" rel=\"noopener\" target=\"_blank\"\u003e`. It is important to note that the addition of `noopener` is a security feature and not an issue. There is an unfortunate feature to browsers that a browser window opened as a result of `target=\"_blank\"` can still control the opener (your web page) and this protects against that. The background to this can be found here: [https://dev.to/ben/the-targetblank-vulnerability-by-example](https://dev.to/ben/the-targetblank-vulnerability-by-example)\n\n### Policy Building Helpers\n\nWe also bundle some helpers to simplify policy building:\n```go\n\n// Permits the \"dir\", \"id\", \"lang\", \"title\" attributes globally\np.AllowStandardAttributes()\n\n// Permits the \"img\" element and its standard attributes\np.AllowImages()\n\n// Permits ordered and unordered lists, and also definition lists\np.AllowLists()\n\n// Permits HTML tables and all applicable elements and non-styling attributes\np.AllowTables()\n```\n\n### Invalid Instructions\n\nThe following are invalid:\n```go\n// This does not say where the attributes are allowed, you need to add\n// .Globally() or .OnElements(...)\n// This will be ignored without error.\np.AllowAttrs(\"value\")\n\n// This does not say where the attributes are allowed, you need to add\n// .Globally() or .OnElements(...)\n// This will be ignored without error.\np.AllowAttrs(\n\t\"type\",\n).Matching(\n\tregexp.MustCompile(\"(?i)^(circle|disc|square|a|A|i|I|1)$\"),\n)\n```\n\nBoth examples exhibit the same issue, they declare attributes but do not then specify whether they are allowed globally or only on specific elements (and which elements). Attributes belong to one or more elements, and the policy needs to declare this.\n\n## Limitations\n\nWe are not yet including any tools to help allow and sanitize CSS. Which means that unless you wish to do the heavy lifting in a single regular expression (inadvisable), **you should not allow the \"style\" attribute anywhere**.\n\nIn the same theme, both `\u003cscript\u003e` and `\u003cstyle\u003e` are considered harmful. These elements (and their content) will not be rendered by default, and require you to explicitly set `p.AllowUnsafe(true)`. You should be aware that allowing these elements defeats the purpose of using a HTML sanitizer as you would be explicitly allowing either JavaScript (and any plainly written XSS) and CSS (which can modify a DOM to insert JS), and additionally but limitations in this library mean it is not aware of whether HTML is validly structured and that can allow these elements to bypass some of the safety mechanisms built into the [WhatWG HTML parser standard](https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inselect).\n\nIt is not the job of bluemonday to fix your bad HTML, it is merely the job of bluemonday to prevent malicious HTML getting through. If you have mismatched HTML elements, or non-conforming nesting of elements, those will remain. But if you have well-structured HTML bluemonday will not break it.\n\n## TODO\n\n* Investigate whether devs want to blacklist elements and attributes. This would allow devs to take an existing policy (such as the `bluemonday.UGCPolicy()` ) that encapsulates 90% of what they're looking for but does more than they need, and to remove the extra things they do not want to make it 100% what they want\n* Investigate whether devs want a validating HTML mode, in which the HTML elements are not just transformed into a balanced tree (every start tag has a closing tag at the correct depth) but also that elements and character data appear only in their allowed context (i.e. that a `table` element isn't a descendent of a `caption`, that `colgroup`, `thead`, `tbody`, `tfoot` and `tr` are permitted, and that character data is not permitted)\n\n## Long term goals\n\n1. Open the code to adversarial peer review similar to the [Attack Review Ground Rules](https://code.google.com/p/owasp-java-html-sanitizer/wiki/AttackReviewGroundRules)\n1. Raise funds and pay for an external security review\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrocosm-cc%2Fbluemonday","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmicrocosm-cc%2Fbluemonday","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrocosm-cc%2Fbluemonday/lists"}