https://github.com/atgreen/cl-sanitize-html

A Common Lisp library for sanitizing HTML using OWASP-style policies
https://github.com/atgreen/cl-sanitize-html

Last synced: 2 months ago
JSON representation

A Common Lisp library for sanitizing HTML using OWASP-style policies

Host: GitHub
URL: https://github.com/atgreen/cl-sanitize-html
Owner: atgreen
Created: 2025-11-02T11:40:30.000Z (6 months ago)
Default Branch: master
Last Pushed: 2026-01-04T16:37:34.000Z (3 months ago)
Last Synced: 2026-01-18T16:16:27.024Z (3 months ago)
Language: Common Lisp
Size: 21.5 KB
Stars: 9
Watchers: 1
Forks: 2
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-cl - cl-sanitize-html - OWASP-style HTML sanitization library for Common Lisp, designed for safely rendering untrusted HTML content (like HTML emails or user-generated content). MIT. (Interfaces to other package managers / Isomorphic web frameworks)

README

          # cl-sanitize-html

OWASP-style HTML sanitization library for Common Lisp, designed for safely rendering untrusted HTML content (like HTML emails or user-generated content).

## Features

- **Whitelist-based sanitization** - Only explicitly allowed tags and attributes pass through

- **Multiple security policies** - Default, Strict, and Email policies included

- **XSS prevention** - Blocks script tags, event handlers, javascript: URLs, and other attack vectors

- **CSS sanitization** - Optional CSS property filtering for email content

- **Safe defaults** - Automatically adds `rel="noopener noreferrer"` and `target="_blank"` to links

- **Plump-based** - Built on the robust Plump HTML parser

- **Well-tested** - Comprehensive test suite covering OWASP attack vectors

## Quick Start

```lisp

(use-package :sanitize-html)

;; Basic usage with default policy

(sanitize "alert('XSS')
Hello")

;; => "Hello
"

;; Remove event handlers

(sanitize "Click me")

;; => "Click me"

;; Use email policy for HTML emails

(sanitize "Cell" *email-policy*)

;; => "Cell"

```

## Security Policies

### Default Policy (*default-policy*)

Balanced security and usability for general web content:

- **Allowed tags**: Common formatting and semantic tags (p, div, span, a, strong, em, lists, tables, etc.)

- **Allowed protocols**: http, https, mailto, ftp

- **Inline styles**: Blocked

- **Comments**: Removed

### Strict Policy (*strict-policy*)

Maximum security with minimal formatting:

- **Allowed tags**: Only basic formatting (a, b, em, strong, ul, ol, li, p, br, code, pre)

- **Allowed protocols**: https, mailto only

- **Very limited attributes**: Only href, title, and class

### Email Policy (*email-policy*)

Designed for HTML emails with legacy formatting:

- **Allowed tags**: All email-safe tags including tables, font, center

- **Allowed protocols**: http, https, mailto, cid (inline images), data (base64)

- **Inline styles**: Allowed with filtered CSS properties

- **Table attributes**: bgcolor, cellpadding, cellspacing, etc.

## API

### Main Functions

```lisp

(sanitize html-string &optional policy)

(sanitize-html html-string &optional policy)

```

Sanitize HTML string according to policy. Returns sanitized HTML string.

**Parameters:**

- `html-string` - String containing HTML to sanitize

- `policy` - Security policy to apply (defaults to `*default-policy*`)

**Returns:** Sanitized HTML string

**Example:**

```lisp

(sanitize "bad
good")

;; => "good"

```

### Utility Functions

```lisp

(safe-url-p url &optional policy)

```

Check if URL uses a safe protocol according to policy.

```lisp

(sanitize-url url &optional policy)

```

Return URL if safe, nil otherwise.

### Custom Policies

```lisp

(make-policy &key allowed-tags allowed-attributes allowed-protocols

                  allowed-css-properties remove-comments escape-cdata)

```

Create a custom security policy.

**Example:**

```lisp

(defparameter *my-policy*

  (make-policy

   :allowed-tags '("p" "br" "a" "strong" "em")

   :allowed-attributes '(("a" . ("href" "title")))

   :allowed-protocols '("https")

   :remove-comments t))

(sanitize html-string *my-policy*)

```

## Security Features

### XSS Prevention

- ✅ Script tags removed

- ✅ Event handlers (onclick, onload, etc.) removed

- ✅ javascript: protocol blocked

- ✅ data: protocol blocked (except in email policy with validation)

- ✅ Inline styles blocked (except in email policy with CSS filtering)

- ✅ Form elements blocked

- ✅ iframe/object/embed blocked

- ✅ meta/link/style/base blocked

### CSS Injection Prevention

- CSS properties filtered by whitelist (email policy only)

- `javascript:`, `expression()`, `@import` blocked in CSS values

- `behavior:` property blocked (IE-specific XSS vector)

### Safe Defaults

- Links automatically get `rel="noopener noreferrer"` (prevents tabnabbing)

- Links automatically get `target="_blank"` (open in new tab)

- Comments removed by default

- CDATA sections escaped by default

## Email HTML Example

```lisp

(defun render-email-html (email-html-body)

  "Safely render HTML email content"

  (sanitize-html email-html-body *email-policy*))

;; Typical email HTML with inline styles and tables

(render-email-html "

  

    

      

        


          Welcome to our newsletter!

        

      

    

  

  

    Visit our site

  

")

```

## Running Tests

```lisp

(asdf:test-system :sanitize-html)

```

Or manually:

```lisp

(asdf:load-system :sanitize-html/tests)

(fiveam:run! :sanitize-html-tests)

```

## Dependencies

- **plump** - Lenient HTML/XML parser

- **lquery** - DOM manipulation

- **cl-ppcre** - Regular expressions for CSS parsing

- **alexandria** - Utilities library

**Test dependencies:**

- **fiveam** - Unit testing framework

## Architecture

1. **Parser** - Uses Plump to parse HTML into a DOM tree

2. **Tree Walker** - Recursively visits each node in the DOM

3. **Policy Enforcer** - Checks each element/attribute against whitelist

4. **Sanitizer** - Removes or modifies unsafe content

5. **Serializer** - Converts sanitized DOM back to HTML string

## Comparison with Other Libraries

| Feature | sanitize-html | bluemonday (Go) | ammonia (Rust) | bleach (Python) |

|---------|------------------|-----------------|----------------|-----------------|

| Whitelist-based | ✅ | ✅ | ✅ | ✅ |

| Multiple policies | ✅ | ✅ | ✅ | ❌ |

| CSS sanitization | ✅ | ✅ | ✅ | ✅ |

| URL validation | ✅ | ✅ | ✅ | ✅ |

| Link safety | ✅ | ✅ | ❌ | ❌ |

| OWASP-aligned | ✅ | ✅ | ✅ | ✅ |

## References

- [OWASP XSS Prevention Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html)

- [HTML5 Security Cheat Sheet](https://html5sec.org/)

- [Plump Documentation](https://shinmera.github.io/plump/)

## Author and License

``sanitize-html`` was written by [Anthony Green](https://github.com/atgreen)

and is distributed under the terms of the MIT license.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/atgreen/cl-sanitize-html

Awesome Lists containing this project

README