https://github.com/oleksiyrudenko/s10n

A lightweight user input sanitization library
https://github.com/oleksiyrudenko/s10n
sanitization semantic-sanitizers user-input
Last synced: 19 days ago
JSON representation
A lightweight user input sanitization library
Host: GitHub
URL: https://github.com/oleksiyrudenko/s10n
Owner: OleksiyRudenko
License: mit
Created: 2019-11-10T21:14:33.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2023-01-06T15:15:06.000Z (over 2 years ago)
Last Synced: 2025-03-11T11:44:50.228Z (about 1 month ago)
Topics: sanitization, semantic-sanitizers, user-input
Language: JavaScript
Homepage: https://oleksiyrudenko.github.io/s10n/sandbox/
Size: 841 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 22
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project

README

        # s10n

> **s10n** stands for _"sanitization"_.

> Just like _l10n_ stands for _"localization"_.

> See also [i18n, l10n et al](https://blog.mozilla.org/l10n/2011/12/14/i18n-vs-l10n-whats-the-diff/)

A library to make basic user input sanitization

and subsequent validation an easier job.

## Table of Contents

- [Use cases](#use-cases)

  - [Example 1. Username](#example-1-username)

  - [Example 2. Arbitrary text](#example-2-arbitrary-text)

- [Installation and Usage](#installation-and-usage)

- [API](#api)

  - [Modifiers](#modifiers)

    - [Treating line break characters](#treating-line-break-characters)

    - [Line break character](#line-break-character)

  - [Elementary transformers](#elementary-transformers)

    - [Transform whitespaces](#transform-whitespaces)

    - [Handle line breaks](#handle-line-breaks)

    - [Keep/Remove/Replace](#keepremovereplace)

    - [Other transformations](#other-transformations)

  - [Compound transformers](#compound-transformers)

  - [Semantic sanitizers](#semantic-sanitizers)

  - [Custom transformations](#custom-transformations)

  - [Getting sanitized value](#getting-sanitized-value)

  - [Utility methods](#utility-methods)

- [Development and Publishing](#development-and-publishing)

## Use cases

Sanitization is **NOT** validation, but

it can help make validation an easier job

and/or help to suggest to a user an input variation

that better matches input expectations or requirements.

As with validation sanitization, if in place, should

be applied on both frontend and backend, since a user

can bypass sanitization and validation on the frontend and

send input directly to a backend endpoint.

### Example 1. Username

Let's assume the following scenario of a username input.

The rule is that only a-z, A-Z, numbers, underscore and dash

are only expected in valid input.

A user submits a string of `#UsEr #$%"' NaMe 5_6-9`.

Input gets invalidated, the rule gets presented to the user,

and the user expected to remove all invalid characters.

The input then becomes a valid string of `UsErNaMe5_6-9`.

Alternatively an app might have suggested (or enforced)

a valid input. Examples below are demonstration of

default and tuned behaviour of a relevant semantic sanitizer

(spaces get replaced with underscores).

```javascript

let input = "  UsEr #$%' NaMe  5_6-9  ";

s10n(input).keepUsername().value; // "UsErNaMe5_6-9"

s10n(input).keepUsernameLC().value; // "username5_6-9"

s10n(input).keepUsername("_").value; // "UsEr_NaMe_5_6-9"

s10n(input).keepUsernameLC("_").value; // "user_name_5_6-9"

```

Semantic sanitizers applied are a combination of elementary and compound

transformers with an optional parameter to replace spaces

(in this particular use case).

### Example 2. Arbitrary text

Let's assume the input received from a user is

`" \n\r\n \u200B\u200C\u200D\u2060 \t\uFEFF\xA0 Sensible text \n Line 2 \n\r\n\r\r "`

Here are some issues worth attention and optimization:

- it contains problematic whitespaces

- it contains sequences of 2 or more whitespaces

- it contains leading and trailing whitespaces

- there is a variety of line break characters,

  potentially hazardous (CRLF injection)

- there are leading and trailing empty lines

- line break characters are invalid in a one line input

Any of the above can be considered as some unnecessarily

contaminated data.

Having all issues fixed the above input would have been:

- `"Sensible text\nLine 2"` for a multiline input

- `"Sensible text Line 2"` for a simple string input

```javascript

let input = "  \n\r\n \u200B\u200C\u200D\u2060 \t\uFEFF\xA0 Sensible   text  \n Line 2  \n\r\n\r\r  ";

s10n(input).minimizeWhitespaces().value; // "Sensible text Line 2"

s10n(input)

  .preserveLineBreaks() // modifier for subsequent methods to preserve line breaks

  .minimizeWhitespaces().value; // "Sensible text\nLine 2"

```

`minimizeWhitespaces` does the following:

- normalizes line break characters, i.e.

  CRLF (`\n\r`) and individual CR (`\r`)

  are converted into LF (`\n`) (default behaviour)

- normalizes whitespaces into standard space character (`\x20`)

- merges continuous whitespaces into a single space character

- normalizes lines in a multiline input

  (strips leading and trailing spaces in each line of a multiline input)

- trims leading and trailing whitespaces

- trims leading and trailing line breaks

Explore [sandbox](https://oleksiyrudenko.github.io/s10n/sandbox/) for more use cases.

[ [^^ Back to TOC ^^](#table-of-contents) ]

## Installation and Usage

### Option1. Install as a project dependency

Run `npm i s10n` to add **s10n** as a dependency to your project.

In your app import s10n by either of the methods:

- `const s10n = require("s10n")` -- node style

- `import s10n from "s10n"` -- module import style

### Option 2. Link directly to the html file

Pick an appropriate version on

[jsdelivr CDN](https://www.jsdelivr.com/package/npm/s10n)

and add to the html file. Example:

```html

```

### Usage

Check the examples across this documentation for the use cases.

Use [sandbox](https://oleksiyrudenko.github.io/s10n/sandbox/) to play around.

[ [^^ Back to TOC ^^](#table-of-contents) ]

## API

`s10n` offers a number of elementary, compound and semantic

transformers and sanitizers as well as a method to apply

an arbitrary sanitizer.

Below are the usage examples to give a general impression

of the API.

```javascript

s10n("  Some text  \n Yet basically valid \n\n  ")

  .preserveLineBreaks()

  .minimizeWhitespaces().value; // "Some text\nYet basically valid"

s10n("  My User Name  ").keepUsernameLC("_").value; // "my_user_name"

let input = "  Some   arbitrary \t \xA0 text  ";

s10n(input)

  .normalizeWhitespaces()

  .trim().value; // "Some   arbitrary     text"

s10n(input)

  .mergeWhitespaces()

  .trim().value; // "Some arbitrary text"

```

[ [^^ Back to TOC ^^](#table-of-contents) ]

### Modifiers

Modifiers affect behaviour of subsequent transformers.

#### Treating line break characters

Defines whether to preserve or disregard line break

characters when applying transformers.

Default behaviour is to disregard line break characters.

This setting doesn't affect some transformers (e.g. **`trimLineBreaks()`**).

These are marked correspondingly.

```javascript

let input = " \n\n\n ";

s10n(input).trim().value; // ""

s10n(input)

  .preserveLineBreaks()

  .trim().value; // "\n\n\n"

```

Call `disregardLineBreaks` when subsequent sanitizers should

disregard line breaks after any preceding transformations

has been affected by `preserveLineBreaks`.

#### Line break character

By default, whenever any sanitizer affects line break characters

a `\n` is considered as a valid or target line break character.

This behaviour can be changed for subsequent sanitizers

(e.g. `setLineBreakCharacter('\r')`).

Whenever line breaks in a string get normalized

CRLF (`\r\n`) is converted into a single line break character

(`\n` by default, or a value assigned by `setLineBreakCharacter` method).

```javascript

let input = "\r\n\n\n\r\r";

s10n(input).normalizeLineBreaks().value; // "\n\n\n\n\n"

s10n(input)

  .setLineBreakCharacter("\r")

  .normalizeLineBreaks().value; // "\r\r\r\r\r"

```

[ [^^ Back to TOC ^^](#table-of-contents) ]

### Elementary transformers

Elementary transformers have a pretty limited scope of responsibility.

Normally used for basic transformations

and as building blocks by compound transformers,

semantic sanitizers and custom transformers/sanitizers.

#### Transform whitespaces

**s10n** treats an extended set of characters, including

`\x20\u200B\u200C\u200D\u2060\uFEFF\xA0` as whitespaces.

Characters `\n` and `\r` are not considered whitespaces

when `preserveLineBreaks` modifier applied.

- **`trim()`** - removes leading and trailing whitespaces

- **`trimLineBreaks()`** - always removes leading and trailing

  line break characters, disregarding the **LineBreak** modifier setting

- **`mergeLineBreaks()`** - normalizes and merges consequent line breaks

  disregarding the **LineBreak** modifier setting

- **`normalizeWhitespaces()`** - all whitespaces are

  converted into space characters (`\x20`)

- **`mergeWhitespaces()`** - merges continuous clusters of whitespaces

  into a single space character (`\x20`)

- **`stripWhitespaces()`** - strips all whitespaces from input

Examples:

```javascript

let input = "\n  Z\tY \x0A \n X W\uFEFFV \n\n  \n";

s10n(input).trim().value; // "Z\tY \x0A \n X W\uFEFFV"

s10n(input)

  .preserveLineBreaks()

  .trim().value; // 

s10n(input).trimLineBreaks().value; // "  Z\tY \x0A \n X W\uFEFFV \n\n  "

s10n(input)

  .preserveLineBreaks()

  .trimLineBreaks().value; // 

s10n("\n\r\r\r\n\nfoo\n\r\nbar\n\r\r\r\n\n").mergeLineBreaks().value; // "\nfoo\nbar\n"

s10n("\n \r\r\r\n\nfoo\n \r\nbar\n\r\r \r\n\n").mergeLineBreaks().value; // "\n \nfoo\n \nbar\n \n"

s10n(input).normalizeWhitespaces().value; // "   Z Y     X W V      "

s10n(input)

  .preserveLineBreaks()

  .normalizeWhitespaces().value; // "\n  Z Y   \n X W V \n\n  \n"

s10n(input).mergeWhitespaces().value; // " Z Y X W V "

s10n(input)

  .preserveLineBreaks()

  .mergeWhitespaces().value; // "\n Z Y \n X W V \n\n \n"

s10n(input).stripWhitespaces().value; // "ZYXWV"

s10n(input)

  .preserveLineBreaks()

  .stripWhitespaces()

  .value(); // "\nZY\nXWV\n\n\n"

s10n(input)

  .preserveLineBreaks()

  .normalizeWhitespaces()

  .mergeWhitespaces()

  .trimLineBreaks()

  .trim().value; // "Z Y \n X W V \n\n"

```

See also [`normalizeLineBreaks()`](#handle-line-breaks)

[ [^^ Back to TOC ^^](#table-of-contents) ]

#### Handle line breaks

- **`normalizeLineBreaks(lineBreakCharacter = undefined)`** -

  transforms CRLF, CR, LF into a line break character defined following the rules below:

  - as specified by `lineBreakCharacter` argument

  - if param `lineBreakCharacter` is undefined, then as set by `setLineBreakCharacter()`

  - if `setLineBreakCharacter()` wasn't applied, then defaults to LF (`'\n'`)

- **`normalizeMultiline()`** -

  strips whitespaces that immediately precede

  or follow line break characters;

  ignores **LineBreak** modifier setting

Examples:

```javascript

let input = "\r\n\r  abc  \r\n  def   \r \t   ghi   \n \t\t  \n \r\n\n\r\n\n\r\r  \r\r\n";

s10n(input).normalizeLineBreaks().value; // "\n\n  abc  \n  def   \n \t   ghi   \n \t\t  \n \n\n\n\n\n\n  \n\n"

s10n(input).normalizeMultiline().value; // "\r\n\rabc\r\ndef\rghi\n\n\r\n\n\r\n\n\r\r\r\r\n"

s10n(input)

  .normalizeLineBreaks()

  .normalizeMultiline().value; // "\n\nabc\ndef\nghi\n\n\n\n\n\n\n\n\n\n"

```

See also [`minimizeWhitespaces()`](#compound-transformers)

[ [^^ Back to TOC ^^](#table-of-contents) ]

#### Keep/Remove/Replace

These methods' behaviour is **NOT** affected

by **LineBreak** modifier

(disregarded by default, i.e. `\s` RegExp token comprises `\r` and `\n`).

Specify `\n` and/or `\r`

explicitly whenever those should be kept or removed.

Method argument should follow RegExp character class

specification.

- **`keepOnlyCharset(allowedChars = "-A-Za-z0-9_\\x20.,}{\\]\\[)(", regexpFlags)`** -

  keep listed characters only

- **`keepOnlyRegExp(regexp, regexpFlags)`** - keep characters as per RegExp

  (RegExp object or regexp body as a string)

- **`remove(disallowedChars, regexpFlags)`** - remove listed characters

- **`replace(needle, replacement = "", regexpFlags)`** -

  replaces a needle (which is a string, or a RegExp object)

  with the replacement string

`regexpFlags` in the methods above is an optional parameter and

defaults to the flags as specified in [`_regexp`](#utility-methods) ("gu").

Examples:

```javascript

let input1 = "ABCDabcd01239 _-.,(abcd){defg}[hijk]";

s10n(input1).keepOnlyCharset("}{][)(").value; // "(){}[]"

s10n(input1).keepOnlyRegexp(/\{.*?\}|\[.*?\]|\(.*?\)/gu).value; // "(abcd){defg}[hijk]"

let input2 = "ABCDEFGHabcdefghABCDEFGHabcdefgh";

s10n(input2).remove("ABCD").value; // "EFGHabcdefghEFGHabcdefgh"

s10n(input2).remove("ABCD", "giu").value; // "EFGHefghEFGHefgh"

s10n(input2).remove(/ABCD/).value; // "EFGHabcdefghABCDEFGHabcdefgh"

s10n(input2).remove(/ABCD/giu).value; // "EFGHefghEFGHefgh"

s10n(input2).remove(/ABCD/, "giu").value; // "EFGHefghEFGHefgh"

```

[ [^^ Back to TOC ^^](#table-of-contents) ]

#### Other transformations

- **`toLowerCase()`** - converts to lower case

- **`toUpperCase()`** - converts to upper case

Examples:

```javascript

let input = "aBcD01";

s10n(input).toLowerCase().value; // "abcd01"

s10n(input).toUpperCase().value; // "ABCD01"

```

[ [^^ Back to TOC ^^](#table-of-contents) ]

### Compound transformers

Compound transformers implement complex

transformation rules applying multiple transformations,

often using elementary transformers.

- **`keepBase10Digits()`** - strips out anything but `0-9`

- **`keepBase16Digits()`** - (alias: **`keepHexDigits()`**) -

  strips out anything but `0-9a-fA-F`;

  best combined chained with `toLowerCase()` or `toUpperCase()`

  for consistent result

- **`minimizeWhitespaces()`** - removes leading, trailing

  and continuous clusters of whitespaces and line breaks;

  when preceded with `preserveLineBreaks()` treats input as

  a multiline string and thus trims spaces in every line

Examples:

```javascript

let input1 = "  XYZ 20fE\n\n  ";

s10n(input1).keepBase10Digits().value; // 20

s10n(input1).keepBase16Digits().value; // 20fE

s10n(input1)

  .keepBase16Digits()

  .toLowerCase().value; // 20fe

s10n(input1)

  .keepHexDigits()

  .toLowerCase().value; // 20fe

let input2 = "  Some text  \n Yet basically valid \n\n  ";

s10n(input2).minimizeWhitespaces().value; // "Some text Yet basically valid"

s10n(input2)

  .preserveLineBreaks()

  .minimizeWhitespaces().value; // "Some text\nYet basically valid"

```

[ [^^ Back to TOC ^^](#table-of-contents) ]

### Semantic sanitizers

Semantic sanitizers implement semantically meaningful

yet heavily opinionated sanitization rules for particular use cases.

- **`keepOnlyEmailPopularCharset()`** - keeps

  only `[email protected]`

- **`keepOnlyEmailExtendedCharset()`** - keeps

  only `A-Za-z0-9_@.+)(-`

- **`keepOnlyEmailRfcCharset()`** - keeps

  only charset as per rfc ( `` A-Za-z0-9_\\-@.+)( \":;<>\\\\,\\[\\]}{!#$%&'*/=?^`|~ ``)

- **`keepUsername(whiteSpaceReplacement = "")`** - keeps

  only `a-zA-Z0-9_-`, whitespaces are stripped or

  are merged and replaced with `whiteSpaceReplacement` if any

  specified

- **`keepUsernameLC(whiteSpaceReplacement = "")`** - same as

  `keepUsername` but the result is converted to lower case

Examples:

```javascript

let input = "  UsEr   #$%\"' NaMe +  (5_6-9) @ .Co.Uk  ";

s10n(input).keepOnlyEmailPopularCharset().value; // "[email protected]"

s10n(input).keepOnlyEmailExtendedCharset().value; // "UsErNaMe+(5_6-9)@.Co.Uk"

s10n(input).keepOnlyEmailRfcCharset().value; // "UsEr#$%\"'NaMe+(5_6-9)@.Co.Uk"

s10n(input)

  .keepOnlyEmailPopularCharset()

  .toLowerCase().value; // "[email protected]"

s10n(input).keepUsername().value; // "UsErNaMe5_6-9CoUk"

s10n(input).keepUsernameLC().value; // "username5_6-9couk"

s10n(input).keepUsername("_").value; // "UsEr_NaMe_5_6-9_CoUk"

s10n(input).keepUsernameLC("_").value; // "user_name_5_6-9_couk"

```

Note: sanitized email input is still invalid but (arguably)

yet easier to double-check and fix.

What if those semantic sanitizers do not fit my needs?

Consider implementing a [customized transformer](#custom-transformations).

[ [^^ Back to TOC ^^](#table-of-contents) ]

### Custom transformations

A custom transformer is a method to apply complex sanitization

logic using elementary or compound transformers, semantic sanitizers

or applying a completely unique rule set.

- **`apply(callback, ...arguments)`** - callback will receive

  current value, calling context (reference to current s10n object as `this`),

  and any extra arguments passed

- **`extend(methodName, method)`** -

  registers a re-usable custom transformation method

  - `extend` should be called on `s10n` object itself rather than in

    a sanitization chain

  - the method is accessible at every sanitization chain once registered

  - the method should transform `this.value` and/or call other

    built-in or registered custom transformers/sanitizers

  - the method should return `this` to make it chainable

  - do not define the method as an arrow function

Example:

```javascript

s10n("c00l").apply(

  (value, context, needle, replacement) => value.replace(context._regexp(needle), replacement),

  "0",

  "o"

).value; // "cool"

s10n.extend("makeCool", function() {

  // replaces 'o' and 'O' followed with whitespaces (extended set) with a single '0'

  this.replace(this._regexp("o\\s+", "gi"), "0");

  return this;

});

s10n("coO\x0A o\t l").makeCool().value; // "co00l"

```

[ [^^ Back to TOC ^^](#table-of-contents) ]

### Getting sanitized value

Getting sanitized value (as a string)

is as simple as terminating

transformation chain with `.value`.

E.g. `s10n(" my User Name ").usernameLC().value`.

In string context `.value` is optional as a string

is being returned by default.

E.g. `` `Username: ${s10n(" my User Name ").usernameLC()}` ``

or `s10n(" my User Name ").usernameLC() + ''`

Explicit value access methods:

- **`value`** - value as is

- **`toString()`** - same as `.value`

- **`toNumber()`** - converts sanitized string

  into a Number. Use with caution as it will return `NaN`

  if sanitized string contains anything else but a valid

  [Number literal](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number).

Examples:

```javascript

let input = "65";

s10n(input).value; // "65"

s10n(input).toString(); // "65"

`${s10n(input)}`; // "65"

s10n(input) + ""; // "65"

s10n(input).toNumber(); // 65

```

[ [^^ Back to TOC ^^](#table-of-contents) ]

### Utility methods

- **`_regexp(patternString, flags = "gu")`** -

  using this utility will ensure that `\s` entities

  in pattern string are replaced with an extended set

  of whitespaces. Recommended for use in `apply` callback.

Example:

```javascript

s10n("\t \xA0 ABC\n\t \uFEFF").apply((value, context) =>

  // replaces extended set of whitespaces with dashes

  value.replace(context._regexp("\\s"), "-")

).value; // "----ABC----"

```

[ [^^ Back to TOC ^^](#table-of-contents) ]

## Development and Publishing

Refer to [CONTRIBUTING.md](./CONTRIBUTING.md) for details.

[ [^^ Back to TOC ^^](#table-of-contents) ]
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/oleksiyrudenko/s10n

Awesome Lists containing this project

README