Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/oleksiyrudenko/s10n
A lightweight user input sanitization library
https://github.com/oleksiyrudenko/s10n
sanitization semantic-sanitizers user-input
Last synced: about 2 months ago
JSON representation
A lightweight user input sanitization library
- Host: GitHub
- URL: https://github.com/oleksiyrudenko/s10n
- Owner: OleksiyRudenko
- License: mit
- Created: 2019-11-10T21:14:33.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2023-01-06T15:15:06.000Z (about 2 years ago)
- Last Synced: 2024-11-29T21:45:26.349Z (2 months ago)
- Topics: sanitization, semantic-sanitizers, user-input
- Language: JavaScript
- Homepage: https://oleksiyrudenko.github.io/s10n/sandbox/
- Size: 841 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 22
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# s10n
> **s10n** stands for _"sanitization"_.
> Just like _l10n_ stands for _"localization"_.
> See also [i18n, l10n et al](https://blog.mozilla.org/l10n/2011/12/14/i18n-vs-l10n-whats-the-diff/)A library to make basic user input sanitization
and subsequent validation an easier job.## Table of Contents
- [Use cases](#use-cases)
- [Example 1. Username](#example-1-username)
- [Example 2. Arbitrary text](#example-2-arbitrary-text)
- [Installation and Usage](#installation-and-usage)
- [API](#api)
- [Modifiers](#modifiers)
- [Treating line break characters](#treating-line-break-characters)
- [Line break character](#line-break-character)
- [Elementary transformers](#elementary-transformers)
- [Transform whitespaces](#transform-whitespaces)
- [Handle line breaks](#handle-line-breaks)
- [Keep/Remove/Replace](#keepremovereplace)
- [Other transformations](#other-transformations)
- [Compound transformers](#compound-transformers)
- [Semantic sanitizers](#semantic-sanitizers)
- [Custom transformations](#custom-transformations)
- [Getting sanitized value](#getting-sanitized-value)
- [Utility methods](#utility-methods)
- [Development and Publishing](#development-and-publishing)## Use cases
Sanitization is **NOT** validation, but
it can help make validation an easier job
and/or help to suggest to a user an input variation
that better matches input expectations or requirements.As with validation sanitization, if in place, should
be applied on both frontend and backend, since a user
can bypass sanitization and validation on the frontend and
send input directly to a backend endpoint.### Example 1. Username
Let's assume the following scenario of a username input.
The rule is that only a-z, A-Z, numbers, underscore and dash
are only expected in valid input.A user submits a string of `#UsEr #$%"' NaMe 5_6-9`.
Input gets invalidated, the rule gets presented to the user,
and the user expected to remove all invalid characters.
The input then becomes a valid string of `UsErNaMe5_6-9`.Alternatively an app might have suggested (or enforced)
a valid input. Examples below are demonstration of
default and tuned behaviour of a relevant semantic sanitizer
(spaces get replaced with underscores).```javascript
let input = " UsEr #$%' NaMe 5_6-9 ";
s10n(input).keepUsername().value; // "UsErNaMe5_6-9"
s10n(input).keepUsernameLC().value; // "username5_6-9"
s10n(input).keepUsername("_").value; // "UsEr_NaMe_5_6-9"
s10n(input).keepUsernameLC("_").value; // "user_name_5_6-9"
```Semantic sanitizers applied are a combination of elementary and compound
transformers with an optional parameter to replace spaces
(in this particular use case).### Example 2. Arbitrary text
Let's assume the input received from a user is
`" \n\r\n \u200B\u200C\u200D\u2060 \t\uFEFF\xA0 Sensible text \n Line 2 \n\r\n\r\r "`Here are some issues worth attention and optimization:
- it contains problematic whitespaces
- it contains sequences of 2 or more whitespaces
- it contains leading and trailing whitespaces
- there is a variety of line break characters,
potentially hazardous (CRLF injection)
- there are leading and trailing empty lines
- line break characters are invalid in a one line inputAny of the above can be considered as some unnecessarily
contaminated data.Having all issues fixed the above input would have been:
- `"Sensible text\nLine 2"` for a multiline input
- `"Sensible text Line 2"` for a simple string input```javascript
let input = " \n\r\n \u200B\u200C\u200D\u2060 \t\uFEFF\xA0 Sensible text \n Line 2 \n\r\n\r\r ";s10n(input).minimizeWhitespaces().value; // "Sensible text Line 2"
s10n(input)
.preserveLineBreaks() // modifier for subsequent methods to preserve line breaks
.minimizeWhitespaces().value; // "Sensible text\nLine 2"
````minimizeWhitespaces` does the following:
- normalizes line break characters, i.e.
CRLF (`\n\r`) and individual CR (`\r`)
are converted into LF (`\n`) (default behaviour)
- normalizes whitespaces into standard space character (`\x20`)
- merges continuous whitespaces into a single space character
- normalizes lines in a multiline input
(strips leading and trailing spaces in each line of a multiline input)
- trims leading and trailing whitespaces
- trims leading and trailing line breaksExplore [sandbox](https://oleksiyrudenko.github.io/s10n/sandbox/) for more use cases.
[ [^^ Back to TOC ^^](#table-of-contents) ]
## Installation and Usage
### Option1. Install as a project dependency
Run `npm i s10n` to add **s10n** as a dependency to your project.
In your app import s10n by either of the methods:
- `const s10n = require("s10n")` -- node style
- `import s10n from "s10n"` -- module import style### Option 2. Link directly to the html file
Pick an appropriate version on
[jsdelivr CDN](https://www.jsdelivr.com/package/npm/s10n)
and add to the html file. Example:```html
```
### Usage
Check the examples across this documentation for the use cases.
Use [sandbox](https://oleksiyrudenko.github.io/s10n/sandbox/) to play around.
[ [^^ Back to TOC ^^](#table-of-contents) ]
## API
`s10n` offers a number of elementary, compound and semantic
transformers and sanitizers as well as a method to apply
an arbitrary sanitizer.Below are the usage examples to give a general impression
of the API.```javascript
s10n(" Some text \n Yet basically valid \n\n ")
.preserveLineBreaks()
.minimizeWhitespaces().value; // "Some text\nYet basically valid"s10n(" My User Name ").keepUsernameLC("_").value; // "my_user_name"
let input = " Some arbitrary \t \xA0 text ";
s10n(input)
.normalizeWhitespaces()
.trim().value; // "Some arbitrary text"s10n(input)
.mergeWhitespaces()
.trim().value; // "Some arbitrary text"
```[ [^^ Back to TOC ^^](#table-of-contents) ]
### Modifiers
Modifiers affect behaviour of subsequent transformers.
#### Treating line break characters
Defines whether to preserve or disregard line break
characters when applying transformers.Default behaviour is to disregard line break characters.
This setting doesn't affect some transformers (e.g. **`trimLineBreaks()`**).
These are marked correspondingly.```javascript
let input = " \n\n\n ";
s10n(input).trim().value; // ""
s10n(input)
.preserveLineBreaks()
.trim().value; // "\n\n\n"
```Call `disregardLineBreaks` when subsequent sanitizers should
disregard line breaks after any preceding transformations
has been affected by `preserveLineBreaks`.#### Line break character
By default, whenever any sanitizer affects line break characters
a `\n` is considered as a valid or target line break character.This behaviour can be changed for subsequent sanitizers
(e.g. `setLineBreakCharacter('\r')`).
Whenever line breaks in a string get normalized
CRLF (`\r\n`) is converted into a single line break character
(`\n` by default, or a value assigned by `setLineBreakCharacter` method).```javascript
let input = "\r\n\n\n\r\r";s10n(input).normalizeLineBreaks().value; // "\n\n\n\n\n"
s10n(input)
.setLineBreakCharacter("\r")
.normalizeLineBreaks().value; // "\r\r\r\r\r"
```[ [^^ Back to TOC ^^](#table-of-contents) ]
### Elementary transformers
Elementary transformers have a pretty limited scope of responsibility.
Normally used for basic transformations
and as building blocks by compound transformers,
semantic sanitizers and custom transformers/sanitizers.#### Transform whitespaces
**s10n** treats an extended set of characters, including
`\x20\u200B\u200C\u200D\u2060\uFEFF\xA0` as whitespaces.
Characters `\n` and `\r` are not considered whitespaces
when `preserveLineBreaks` modifier applied.- **`trim()`** - removes leading and trailing whitespaces
- **`trimLineBreaks()`** - always removes leading and trailing
line break characters, disregarding the **LineBreak** modifier setting
- **`mergeLineBreaks()`** - normalizes and merges consequent line breaks
disregarding the **LineBreak** modifier setting
- **`normalizeWhitespaces()`** - all whitespaces are
converted into space characters (`\x20`)
- **`mergeWhitespaces()`** - merges continuous clusters of whitespaces
into a single space character (`\x20`)
- **`stripWhitespaces()`** - strips all whitespaces from inputExamples:
```javascript
let input = "\n Z\tY \x0A \n X W\uFEFFV \n\n \n";s10n(input).trim().value; // "Z\tY \x0A \n X W\uFEFFV"
s10n(input)
.preserveLineBreaks()
.trim().value; //s10n(input).trimLineBreaks().value; // " Z\tY \x0A \n X W\uFEFFV \n\n "
s10n(input)
.preserveLineBreaks()
.trimLineBreaks().value; //s10n("\n\r\r\r\n\nfoo\n\r\nbar\n\r\r\r\n\n").mergeLineBreaks().value; // "\nfoo\nbar\n"
s10n("\n \r\r\r\n\nfoo\n \r\nbar\n\r\r \r\n\n").mergeLineBreaks().value; // "\n \nfoo\n \nbar\n \n"s10n(input).normalizeWhitespaces().value; // " Z Y X W V "
s10n(input)
.preserveLineBreaks()
.normalizeWhitespaces().value; // "\n Z Y \n X W V \n\n \n"s10n(input).mergeWhitespaces().value; // " Z Y X W V "
s10n(input)
.preserveLineBreaks()
.mergeWhitespaces().value; // "\n Z Y \n X W V \n\n \n"s10n(input).stripWhitespaces().value; // "ZYXWV"
s10n(input)
.preserveLineBreaks()
.stripWhitespaces()
.value(); // "\nZY\nXWV\n\n\n"s10n(input)
.preserveLineBreaks()
.normalizeWhitespaces()
.mergeWhitespaces()
.trimLineBreaks()
.trim().value; // "Z Y \n X W V \n\n"
```See also [`normalizeLineBreaks()`](#handle-line-breaks)
[ [^^ Back to TOC ^^](#table-of-contents) ]
#### Handle line breaks
- **`normalizeLineBreaks(lineBreakCharacter = undefined)`** -
transforms CRLF, CR, LF into a line break character defined following the rules below:
- as specified by `lineBreakCharacter` argument
- if param `lineBreakCharacter` is undefined, then as set by `setLineBreakCharacter()`
- if `setLineBreakCharacter()` wasn't applied, then defaults to LF (`'\n'`)
- **`normalizeMultiline()`** -
strips whitespaces that immediately precede
or follow line break characters;
ignores **LineBreak** modifier settingExamples:
```javascript
let input = "\r\n\r abc \r\n def \r \t ghi \n \t\t \n \r\n\n\r\n\n\r\r \r\r\n";
s10n(input).normalizeLineBreaks().value; // "\n\n abc \n def \n \t ghi \n \t\t \n \n\n\n\n\n\n \n\n"
s10n(input).normalizeMultiline().value; // "\r\n\rabc\r\ndef\rghi\n\n\r\n\n\r\n\n\r\r\r\r\n"
s10n(input)
.normalizeLineBreaks()
.normalizeMultiline().value; // "\n\nabc\ndef\nghi\n\n\n\n\n\n\n\n\n\n"
```See also [`minimizeWhitespaces()`](#compound-transformers)
[ [^^ Back to TOC ^^](#table-of-contents) ]
#### Keep/Remove/Replace
These methods' behaviour is **NOT** affected
by **LineBreak** modifier
(disregarded by default, i.e. `\s` RegExp token comprises `\r` and `\n`).
Specify `\n` and/or `\r`
explicitly whenever those should be kept or removed.Method argument should follow RegExp character class
specification.- **`keepOnlyCharset(allowedChars = "-A-Za-z0-9_\\x20.,}{\\]\\[)(", regexpFlags)`** -
keep listed characters only
- **`keepOnlyRegExp(regexp, regexpFlags)`** - keep characters as per RegExp
(RegExp object or regexp body as a string)
- **`remove(disallowedChars, regexpFlags)`** - remove listed characters
- **`replace(needle, replacement = "", regexpFlags)`** -
replaces a needle (which is a string, or a RegExp object)
with the replacement string`regexpFlags` in the methods above is an optional parameter and
defaults to the flags as specified in [`_regexp`](#utility-methods) ("gu").Examples:
```javascript
let input1 = "ABCDabcd01239 _-.,(abcd){defg}[hijk]";
s10n(input1).keepOnlyCharset("}{][)(").value; // "(){}[]"
s10n(input1).keepOnlyRegexp(/\{.*?\}|\[.*?\]|\(.*?\)/gu).value; // "(abcd){defg}[hijk]"let input2 = "ABCDEFGHabcdefghABCDEFGHabcdefgh";
s10n(input2).remove("ABCD").value; // "EFGHabcdefghEFGHabcdefgh"
s10n(input2).remove("ABCD", "giu").value; // "EFGHefghEFGHefgh"
s10n(input2).remove(/ABCD/).value; // "EFGHabcdefghABCDEFGHabcdefgh"
s10n(input2).remove(/ABCD/giu).value; // "EFGHefghEFGHefgh"
s10n(input2).remove(/ABCD/, "giu").value; // "EFGHefghEFGHefgh"
```[ [^^ Back to TOC ^^](#table-of-contents) ]
#### Other transformations
- **`toLowerCase()`** - converts to lower case
- **`toUpperCase()`** - converts to upper caseExamples:
```javascript
let input = "aBcD01";
s10n(input).toLowerCase().value; // "abcd01"
s10n(input).toUpperCase().value; // "ABCD01"
```[ [^^ Back to TOC ^^](#table-of-contents) ]
### Compound transformers
Compound transformers implement complex
transformation rules applying multiple transformations,
often using elementary transformers.- **`keepBase10Digits()`** - strips out anything but `0-9`
- **`keepBase16Digits()`** - (alias: **`keepHexDigits()`**) -
strips out anything but `0-9a-fA-F`;
best combined chained with `toLowerCase()` or `toUpperCase()`
for consistent result
- **`minimizeWhitespaces()`** - removes leading, trailing
and continuous clusters of whitespaces and line breaks;
when preceded with `preserveLineBreaks()` treats input as
a multiline string and thus trims spaces in every lineExamples:
```javascript
let input1 = " XYZ 20fE\n\n ";s10n(input1).keepBase10Digits().value; // 20
s10n(input1).keepBase16Digits().value; // 20fE
s10n(input1)
.keepBase16Digits()
.toLowerCase().value; // 20fe
s10n(input1)
.keepHexDigits()
.toLowerCase().value; // 20felet input2 = " Some text \n Yet basically valid \n\n ";
s10n(input2).minimizeWhitespaces().value; // "Some text Yet basically valid"
s10n(input2)
.preserveLineBreaks()
.minimizeWhitespaces().value; // "Some text\nYet basically valid"
```[ [^^ Back to TOC ^^](#table-of-contents) ]
### Semantic sanitizers
Semantic sanitizers implement semantically meaningful
yet heavily opinionated sanitization rules for particular use cases.- **`keepOnlyEmailPopularCharset()`** - keeps
only `[email protected]`
- **`keepOnlyEmailExtendedCharset()`** - keeps
only `A-Za-z0-9_@.+)(-`
- **`keepOnlyEmailRfcCharset()`** - keeps
only charset as per rfc ( `` A-Za-z0-9_\\-@.+)( \":;<>\\\\,\\[\\]}{!#$%&'*/=?^`|~ ``)
- **`keepUsername(whiteSpaceReplacement = "")`** - keeps
only `a-zA-Z0-9_-`, whitespaces are stripped or
are merged and replaced with `whiteSpaceReplacement` if any
specified
- **`keepUsernameLC(whiteSpaceReplacement = "")`** - same as
`keepUsername` but the result is converted to lower caseExamples:
```javascript
let input = " UsEr #$%\"' NaMe + (5_6-9) @ .Co.Uk ";s10n(input).keepOnlyEmailPopularCharset().value; // "[email protected]"
s10n(input).keepOnlyEmailExtendedCharset().value; // "UsErNaMe+(5_6-9)@.Co.Uk"
s10n(input).keepOnlyEmailRfcCharset().value; // "UsEr#$%\"'NaMe+(5_6-9)@.Co.Uk"
s10n(input)
.keepOnlyEmailPopularCharset()
.toLowerCase().value; // "[email protected]"s10n(input).keepUsername().value; // "UsErNaMe5_6-9CoUk"
s10n(input).keepUsernameLC().value; // "username5_6-9couk"
s10n(input).keepUsername("_").value; // "UsEr_NaMe_5_6-9_CoUk"
s10n(input).keepUsernameLC("_").value; // "user_name_5_6-9_couk"
```Note: sanitized email input is still invalid but (arguably)
yet easier to double-check and fix.What if those semantic sanitizers do not fit my needs?
Consider implementing a [customized transformer](#custom-transformations).[ [^^ Back to TOC ^^](#table-of-contents) ]
### Custom transformations
A custom transformer is a method to apply complex sanitization
logic using elementary or compound transformers, semantic sanitizers
or applying a completely unique rule set.- **`apply(callback, ...arguments)`** - callback will receive
current value, calling context (reference to current s10n object as `this`),
and any extra arguments passed
- **`extend(methodName, method)`** -
registers a re-usable custom transformation method
- `extend` should be called on `s10n` object itself rather than in
a sanitization chain
- the method is accessible at every sanitization chain once registered
- the method should transform `this.value` and/or call other
built-in or registered custom transformers/sanitizers
- the method should return `this` to make it chainable
- do not define the method as an arrow functionExample:
```javascript
s10n("c00l").apply(
(value, context, needle, replacement) => value.replace(context._regexp(needle), replacement),
"0",
"o"
).value; // "cool"s10n.extend("makeCool", function() {
// replaces 'o' and 'O' followed with whitespaces (extended set) with a single '0'
this.replace(this._regexp("o\\s+", "gi"), "0");
return this;
});
s10n("coO\x0A o\t l").makeCool().value; // "co00l"
```[ [^^ Back to TOC ^^](#table-of-contents) ]
### Getting sanitized value
Getting sanitized value (as a string)
is as simple as terminating
transformation chain with `.value`.
E.g. `s10n(" my User Name ").usernameLC().value`.
In string context `.value` is optional as a string
is being returned by default.
E.g. `` `Username: ${s10n(" my User Name ").usernameLC()}` ``
or `s10n(" my User Name ").usernameLC() + ''`Explicit value access methods:
- **`value`** - value as is
- **`toString()`** - same as `.value`
- **`toNumber()`** - converts sanitized string
into a Number. Use with caution as it will return `NaN`
if sanitized string contains anything else but a valid
[Number literal](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number).Examples:
```javascript
let input = "65";
s10n(input).value; // "65"
s10n(input).toString(); // "65"
`${s10n(input)}`; // "65"
s10n(input) + ""; // "65"
s10n(input).toNumber(); // 65
```[ [^^ Back to TOC ^^](#table-of-contents) ]
### Utility methods
- **`_regexp(patternString, flags = "gu")`** -
using this utility will ensure that `\s` entities
in pattern string are replaced with an extended set
of whitespaces. Recommended for use in `apply` callback.Example:
```javascript
s10n("\t \xA0 ABC\n\t \uFEFF").apply((value, context) =>
// replaces extended set of whitespaces with dashes
value.replace(context._regexp("\\s"), "-")
).value; // "----ABC----"
```[ [^^ Back to TOC ^^](#table-of-contents) ]
## Development and Publishing
Refer to [CONTRIBUTING.md](./CONTRIBUTING.md) for details.
[ [^^ Back to TOC ^^](#table-of-contents) ]