https://github.com/spencermountain/out-of-character
remove invisible unicode characters
https://github.com/spencermountain/out-of-character
invisible-characters unicode zero-width-space
Last synced: 3 months ago
JSON representation
remove invisible unicode characters
- Host: GitHub
- URL: https://github.com/spencermountain/out-of-character
- Owner: spencermountain
- License: mit
- Created: 2021-02-11T15:13:47.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2024-03-02T15:58:24.000Z (over 1 year ago)
- Last Synced: 2025-03-18T11:11:29.403Z (3 months ago)
- Topics: invisible-characters, unicode, zero-width-space
- Language: JavaScript
- Homepage:
- Size: 372 KB
- Stars: 17
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: changelog.md
- License: LICENSE
Awesome Lists containing this project
README
Unicode has [a few-dozen](https://character.construction/blanks) characters that *do not render anything*, on purpose.
This is *cool* for cultural idiosyncracies in historical languages.
More often though, their use is unintentional *(or [nefarious!](https://330k.github.io/misc_tools/unicode_steganography.html))*, and these characters end-up causing problems parsing text formats.
• these are sometimes called *'zero-width'*, *'ignorable'*, or *'tag-characters'* •
This library helps spot and remove these funboys, before they cause some trouble.
Please remember that some text is meant to have *Khmer-vowels*, or *Kaithi-alphabet* characters.
## CLI
npm install -g out-of-character
detect invisible characters in all files in a directory
```bash
out-of-character ./path/to/dir
```remove them from all files in a directory
```bash
out-of-character ./path/to/dir --replace
```---
detect invisible characters in a file
```bash
out-of-character ./path/to/file.txt
```remove invisible characters from a file
```bash
out-of-character ./path/to/file.txt --replace
```
## Javascript API
```js
import {detect, replace} from 'out-of-character'let str='nothing s͏neak឵y here' //actually, there is.
console.log(detect(str))
/* 😮 😮 😮
[
{
name: 'KHMER VOWEL INHERENT AA',
code: 'U+17B5',
offset: 15,
replacement: ''
},
{
name: 'MONGOLIAN VOWEL SEPARATOR',
code: 'U+180E',
offset: 19,
replacement: ''
}
]*/// get rid of them!
let after = replace(str)
console.log(str !== after)
// true
```fixing/detecting in files can be done like:
```js
const fs = require('fs')
const {detect, replace} = require('out-of-character')let text = fs.readFileSync('./some-file.txt').toString()
console.log(detect(text))
// yikes.// ok, fix it
fs.writeFileSync('./some-file.txt', replace(text))// ok, double-check it.
let goodNow = fs.readFileSync('./some-file.txt').toString()
console.log(detect(goodNow))
// fhew.```
*Thank you to [character.construction/blanks](https://character.construction/blanks) by [Jan Lelis](https://janlelis.com/)*
*and [a tale of characters in Unicode](https://www.contentful.com/blog/2016/12/06/unicode-javascript-and-the-emoji-family/) by [Stefan Judis](https://github.com/stefanjudis)*
### See also
* [printable-characters](https://github.com/xpl/printable-characters) - by Vit Gordon
* [unzalgo](https://github.com/kdex/unzalgo) - by kdex
MIT