An open API service indexing awesome lists of open source software.

https://github.com/spencermountain/out-of-character

remove invisible unicode characters
https://github.com/spencermountain/out-of-character

invisible-characters unicode zero-width-space

Last synced: 3 months ago
JSON representation

remove invisible unicode characters

Awesome Lists containing this project

README

        


remove invisible unicode characters from a text









npm install out-of-character
by Spencer Kelly and Adam Tsiopani



Unicode has [a few-dozen](https://character.construction/blanks) characters that *do not render anything*, on purpose.

This is *cool* for cultural idiosyncracies in historical languages.
More often though, their use is unintentional *(or [nefarious!](https://330k.github.io/misc_tools/unicode_steganography.html))*, and these characters end-up causing problems parsing text formats.

• these are sometimes called *'zero-width'*, *'ignorable'*, or *'tag-characters'* •

This library helps spot and remove these funboys, before they cause some trouble.

Please remember that some text is meant to have *Khmer-vowels*, or *Kaithi-alphabet* characters.

image

## CLI

npm install -g out-of-character

detect invisible characters in all files in a directory
```bash
out-of-character ./path/to/dir
```

remove them from all files in a directory
```bash
out-of-character ./path/to/dir --replace
```

---

detect invisible characters in a file
```bash
out-of-character ./path/to/file.txt
```

remove invisible characters from a file
```bash
out-of-character ./path/to/file.txt --replace
```

## Javascript API
```js
import {detect, replace} from 'out-of-character'

let str='noth­ing s͏neak឵y h᠎ere' //actually, there is.
console.log(detect(str))
/* 😮 😮 😮
[
{
name: 'KHMER VOWEL INHERENT AA',
code: 'U+17B5',
offset: 15,
replacement: ''
},
{
name: 'MONGOLIAN VOWEL SEPARATOR',
code: 'U+180E',
offset: 19,
replacement: ''
}
]*/

// get rid of them!
let after = replace(str)
console.log(str !== after)
// true
```

fixing/detecting in files can be done like:
```js
const fs = require('fs')
const {detect, replace} = require('out-of-character')

let text = fs.readFileSync('./some-file.txt').toString()
console.log(detect(text))
// yikes.

// ok, fix it
fs.writeFileSync('./some-file.txt', replace(text))

// ok, double-check it.
let goodNow = fs.readFileSync('./some-file.txt').toString()
console.log(detect(goodNow))
// fhew.

```

*Thank you to [character.construction/blanks](https://character.construction/blanks) by [Jan Lelis](https://janlelis.com/)*

*and [a tale of characters in Unicode](https://www.contentful.com/blog/2016/12/06/unicode-javascript-and-the-emoji-family/) by [Stefan Judis](https://github.com/stefanjudis)*

### See also
* [printable-characters](https://github.com/xpl/printable-characters) - by Vit Gordon
* [unzalgo](https://github.com/kdex/unzalgo) - by kdex

MIT