{"id":17656638,"url":"https://github.com/straight-shoota/sanitize","last_synced_at":"2025-05-07T10:30:44.189Z","repository":{"id":39617237,"uuid":"265381994","full_name":"straight-shoota/sanitize","owner":"straight-shoota","description":"Crystal library for transforming HTML/XML trees to sanitize HTML from untrusted sources","archived":false,"fork":false,"pushed_at":"2024-10-15T11:52:22.000Z","size":116,"stargazers_count":23,"open_issues_count":1,"forks_count":2,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-11-02T08:41:56.848Z","etag":null,"topics":["crystal","html","html-traverse","sanitization","sanitize-html","sanitize-url","striptags","xml-transformation","xss-filter"],"latest_commit_sha":null,"homepage":"","language":"Crystal","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/straight-shoota.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-05-19T22:24:03.000Z","updated_at":"2024-10-15T11:52:25.000Z","dependencies_parsed_at":"2022-09-16T11:32:04.275Z","dependency_job_id":null,"html_url":"https://github.com/straight-shoota/sanitize","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/straight-shoota%2Fsanitize","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/straight-shoota%2Fsanitize/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/straight-shoota%2Fsanitize/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/straight-shoota%2Fsanitize/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/straight-shoota","download_url":"https://codeload.github.com/straight-shoota/sanitize/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223806487,"owners_count":17205982,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crystal","html","html-traverse","sanitization","sanitize-html","sanitize-url","striptags","xml-transformation","xss-filter"],"created_at":"2024-10-23T14:35:13.573Z","updated_at":"2024-11-09T09:05:36.361Z","avatar_url":"https://github.com/straight-shoota.png","language":"Crystal","funding_links":[],"categories":[],"sub_categories":[],"readme":"# sanitize\n\n`sanitize` is a Crystal library for transforming HTML/XML trees. It's primarily\nused to sanitize HTML from untrusted sources in order to prevent\n[XSS attacks](http://en.wikipedia.org/wiki/Cross-site_scripting) and other\nadversities.\n\nIt builds on stdlib's [`XML`](https://crystal-lang.org/api/XML.html) module to\nparse HTML/XML. Based on [libxml2](http://xmlsoft.org/) it's a solid parser and\nturns malformed and malicious input into valid and safe markup.\n\n* Code: [https://github.com/straight-shoota/sanitize](https://github.com/straight-shoota/sanitize)\n* API docs: [https://straight-shoota.github.io/sanitize/api/latest/](https://straight-shoota.github.io/sanitize/api/latest/)\n* Issue tracker: [https://github.com/straight-shoota/sanitize/issues](https://github.com/straight-shoota/sanitize/issues)\n* Shardbox: [https://shardbox.org/shards/sanitize](https://shardbox.org/shards/sanitize)\n\n## Installation\n\n1. Add the dependency to your `shard.yml`:\n\n   ```yaml\n   dependencies:\n     sanitize:\n       github: straight-shoota/sanitize\n   ```\n\n2. Run `shards install`\n\n## Sanitization Features\n\nThe `Sanitize::Policy::HTMLSanitizer` policy applies the following sanitization steps. Except\nfor the first one (which is essential to the entire process), all can be disabled\nor configured.\n\n* Turns malformed and malicious HTML into valid and safe markup.\n* Strips HTML elements and attributes not included in the safe list.\n* Sanitizes URL attributes (like `href` or `src`) with customizable sanitization\n  policy.\n* Adds `rel=\"nofollow\"` to all links and `rel=\"noopener\"` to links with `target`.\n* Validates values of accepted attributes `align`, `width` and `height`.\n* Filters `class` attributes based on a whitelist (by default all classes are\n  rejected).\n\n## Usage\n\nTransformation is based on rules defined by `Sanitize::Policy` implementations.\n\nThe recommended standard policy for HTML sanitization is `Sanitize::Policy::HTMLSanitizer.common`\nwhich represents good defaults for most use cases.\nIt sanitizes user input against a known safe list of accepted elements and their\nattributes.\n\n```crystal\nrequire \"sanitize\"\n\nsanitizer = Sanitize::Policy::HTMLSanitizer.common\nsanitizer.process(%(\u003ca href=\"javascript:alert('foo')\"\u003efoo\u003c/a\u003e)) # =\u003e %(foo)\nsanitizer.process(%(\u003cp\u003e\u003ca href=\"foo\"\u003efoo\u003c/a\u003e\u003c/p\u003e)) # =\u003e %(\u003cp\u003e\u003ca href=\"foo\" rel=\"nofollow\"\u003efoo\u003c/a\u003e\u003c/p\u003e)\nsanitizer.process(%(\u003cimg src=\"foo.jpg\"\u003e)) # =\u003e %(\u003cimg src=\"foo.jpg\"\u003e)\nsanitizer.process(%(\u003ctable\u003e\u003ctr\u003e\u003ctd\u003efoo\u003c/td\u003e\u003ctd\u003ebar\u003c/td\u003e\u003c/tr\u003e\u003c/table\u003e)) # =\u003e %(\u003ctable\u003e\u003ctr\u003e\u003ctd\u003efoo\u003c/td\u003e\u003ctd\u003ebar\u003c/td\u003e\u003c/tr\u003e\u003c/table\u003e)\n```\n\nSanitization should always run after any other processing (for example rendering\nMarkdown) and is a must when including HTML from untrusted sources into a web\npage.\n\n### With Markd\n\nA typical format for user generated content is `Markdown`. Even though it has\nonly a very limited feature set compared to HTML, it can still produce\npotentially harmful HTML and is is usually possible to embed raw HTML directly.\nSo Sanitization is necessary.\n\nThe most common Markdown renderer is [markd](https://shardbox.org/shards/markd),\nso here is a sample how to use it with `sanitize`:\n\n````crystal\nsanitizer = Sanitize::Policy::HTMLSanitizer.common\n# Allow classes with `language-` prefix which are used for syntax highlighting.\nsanitizer.valid_classes \u003c\u003c /language-.+/\n\nmarkdown = \u003c\u003c-MD\n  Sanitization with [https://shardbox.org/shards/sanitize](sanitize) is not that\n  **difficult**.\n  ```cr\n  puts \"Hello World!\"\n  ```\n  \u003cp\u003e\u003ca href=\"javascript:alert(\"XSS attack!\")\"\u003eHello world!\u003c/a\u003e\u003c/p\u003e\n  MD\n\nhtml = Markd.to_html(markdown)\nsanitized = sanitizer.process(html)\nputs sanitized\n````\n\nThe result:\n\n```html\n\u003cp\u003eSanitization with \u003ca href=\"sanitize\" rel=\"nofollow\"\u003ehttps://shardbox.org/shards/sanitize\u003c/a\u003e is not that\n\u003cstrong\u003edifficult\u003c/strong\u003e.\u003c/p\u003e\n\u003cpre\u003e\u003ccode class=\"language-cr\"\u003eputs \u0026quot;Hello World!\u0026quot;\n\u003c/code\u003e\u003c/pre\u003e\n\u003cp\u003eHello world!\u003c/p\u003e\n```\n\n## Limitations\n\nSanitizing CSS is not supported. Thus `style` attributes can't be accepted in a\nsafe way.\nCSS sanitization features may be added when a CSS parsing library is available.\n\n## Security\n\nIf you want to privately disclose security-issues, please contact\n[straightshoota](https://keybase.io/straightshoota) on Keybase or\n[straightshoota@gmail.com](mailto:straightshoota@gmail.com) (PGP: `DF2D C9E9 FFB9 6AE0 2070 D5BC F0F3 4963 7AC5 087A`).\n\n## Contributing\n\n1. Fork it ([https://github.com/straight-shoota/sanitize/fork](https://github.com/straight-shoota/sanitize/fork))\n2. Create your feature branch (`git checkout -b my-new-feature`)\n3. Commit your changes (`git commit -am 'Add some feature'`)\n4. Push to the branch (`git push origin my-new-feature`)\n5. Create a new Pull Request\n\n## Contributors\n\n- [Johannes Müller](https://github.com/straight-shoota) - creator and maintainer\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstraight-shoota%2Fsanitize","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstraight-shoota%2Fsanitize","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstraight-shoota%2Fsanitize/lists"}