https://github.com/didrod205/unspook
Reveal & remove invisible, dangerous & confusable characters in your text — zero-width spaces, BOMs, bidi (Trojan Source), homoglyphs, smart quotes. 100% local. Web app + library + CLI.
https://github.com/didrod205/unspook
homoglyph invisible-characters prompt-injection security smart-quotes text-cleaner trojan-source unicode zero-dependency zero-width-space
Last synced: 15 days ago
JSON representation
Reveal & remove invisible, dangerous & confusable characters in your text — zero-width spaces, BOMs, bidi (Trojan Source), homoglyphs, smart quotes. 100% local. Web app + library + CLI.
- Host: GitHub
- URL: https://github.com/didrod205/unspook
- Owner: didrod205
- License: mit
- Created: 2026-05-29T13:59:58.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2026-05-29T14:12:18.000Z (about 1 month ago)
- Last Synced: 2026-05-29T16:04:32.172Z (about 1 month ago)
- Topics: homoglyph, invisible-characters, prompt-injection, security, smart-quotes, text-cleaner, trojan-source, unicode, zero-dependency, zero-width-space
- Language: TypeScript
- Homepage: https://didrod205.github.io/unspook/
- Size: 48.8 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# 👻 unspook
### Reveal & remove the invisible, dangerous, and confusable characters hiding in your text.
[](https://www.npmjs.com/package/unspook)
[](https://bundlephobia.com/package/unspook)
[](https://github.com/didrod205/unspook/actions/workflows/ci.yml)
[](https://www.npmjs.com/package/unspook)
[](./LICENSE)
**[🌐 Try the free web app →](https://didrod205.github.io/unspook/)** · no install, nothing uploaded, works offline.
---
Your text is probably not as clean as it looks. Copy something from a website, a
PDF, a Word doc, a chat app, or an AI assistant and you'll often paste in
**characters you can't see**:
- **Zero-width spaces** and **BOMs** that break `===` comparisons, search, and CSV imports.
- **Non-breaking spaces** masquerading as normal spaces — the bane of every "why won't this match?" bug.
- **“Smart quotes”, em–dashes and ellipses…** that wreck code, JSON, and CSVs.
- **Bidi control characters** — the [*Trojan Source*](https://trojansource.codes/) attack (CVE-2021-42574) that makes code read one way and compile another.
- **Unicode "tag" characters** used to smuggle **invisible prompt-injection** instructions into text fed to LLMs.
- **Homoglyphs** — a Cyrillic `а` or Greek `ο` that looks exactly like Latin but isn't (phishing, impersonation, broken lookups).
**unspook** finds them, shows you exactly what's there, and cleans your text —
**100% locally**, with **zero dependencies** and **no API key**.
> 📸 _Screenshot / demo GIF:_ `./web/screenshot.png` — replace with a recording of the [live app](https://didrod205.github.io/unspook/).
## Why it exists
Every "text sanitizer" you find online makes you **paste sensitive content into
someone else's server**. That's exactly backwards for a privacy/security tool.
unspook runs entirely in your browser or your terminal — your text never leaves
your machine. And because detecting these characters is a precise,
spec-based problem (not a vibe), it's the kind of thing you want a small, tested,
**deterministic** tool for — not a guess.
## Who it's for
Developers (clean code, configs, commit hooks), **writers & marketers** (clean
copy before publishing), **designers** (paste-safe content), **educators &
researchers** (spot hidden characters in AI text), **ops & support** (sanitize
logs and tickets), and anyone who's ever fought a "looks identical but won't
match" bug.
## Install
**No install needed —** just open the **[web app](https://didrod205.github.io/unspook/)**.
For the library / CLI:
```bash
npm install unspook # library
npm install -g unspook # CLI (or use npx unspook)
```
Ships ESM **and** CommonJS, with TypeScript types.
## Usage
### In code
```ts
import { scan, clean, reveal, report, stats } from "unspook";
clean("Helloworld"); // "Helloworld" (zero-width space removed)
clean("a b"); // "a b" (NBSP → normal space)
clean("“quote” — dash…", { smartPunctuation: true }); // '"quote" -- dash...'
clean("аdmin", { homoglyphs: true }); // "admin" (Cyrillic а → a)
scan("hithere");
// [{ index: 2, line: 1, column: 3, char: "", codePoint: 8203, hex: "U+200B",
// name: "ZERO WIDTH SPACE", category: "zero-width", severity: "warning" }]
reveal("ab"); // "a[U+200B]b"
// report() pairs each finding with its source line — for security/code review.
report(code); // [{ finding: { line, column, hex, name, … }, lineText }, …]
stats(text); // { total, byCategory, bySeverity }
```
Every `Finding` now carries **`line` and `column`** (1-based; column counted in
code points, so it matches what you see) — jump straight to the offender.
### On the command line
```bash
unspook notes.md # print cleaned text
cat draft.txt | unspook # use it as a filter in any pipeline
unspook -w README.md # clean a file in place
unspook --reveal config.yml # show what's hiding
unspook --scan src/index.ts # list findings (line:col); exits 1 if any → CI
unspook --report src/index.ts # show each finding with its source line + caret
unspook --aggressive blog.md # also fix smart quotes, homoglyphs & whitespace
```
`--report` prints a compiler-style diagnostic — perfect for catching a
**Trojan Source** attack in review:
```text
src/auth.js:2:18 DANGER U+202E RIGHT-TO-LEFT OVERRIDE (bidi)
if (access != "ad[U+202E]nimda[U+202C]") {
^
```
Drop `--scan` into a pre-commit hook or CI to **fail the build if invisible/bidi
characters sneak into your codebase.**
### Cleaning options
| Option | Default | What it does |
| ------ | :-----: | ------------ |
| `zeroWidth` | ✅ | Remove zero-width / invisible chars (ZWSP, BOM, word joiner…) |
| `bidi` | ✅ | Remove bidirectional controls (Trojan Source) |
| `tag` | ✅ | Remove Unicode tag chars (invisible prompt injection) |
| `control` | ✅ | Remove C0/C1 control characters |
| `invisibleSpaces` | ✅ | Normalize NBSP & exotic spaces → space; drop soft hyphens |
| `variationSelectors` | ❌ | Remove variation selectors (off by default — used by emoji) |
| `smartPunctuation` | ❌ | Convert “ ” ‘ ’ — … to ASCII |
| `homoglyphs` | ❌ | Map look-alike letters to Latin (Cyrillic/Greek/fullwidth) |
| `collapseWhitespace` | ❌ | Collapse runs of spaces/tabs |
| `normalizeNewlines` | ✅ | `\r\n`, `\r` → `\n` |
| `trim` | ❌ | Trim the ends |
`DEFAULT_OPTIONS` and `AGGRESSIVE_OPTIONS` presets are exported too.
## FAQ
**Is my text uploaded anywhere?**
No. The web app and the library run entirely on your device — there is no
server, no telemetry, no network request. You can use it offline.
**Will it break my emoji?**
No. Variation selectors (which emoji rely on) are kept by default. Turn on
`variationSelectors` only if you specifically want them removed.
**Does it modify visible content?**
By default it only removes invisible/dangerous characters and normalizes odd
spaces — your visible text is preserved. Smart-quote and homoglyph conversion
are **opt-in** because they change visible characters.
**How is this different from a regex like `/[]/g`?**
unspook covers dozens of code points across eight categories (zero-width, bidi,
tag, control, exotic spaces, smart punctuation, homoglyphs, variation selectors),
names each finding, assigns a severity, tracks positions, and gives you a tested,
maintained, reversible-by-option cleaner. No regex to copy-paste-and-get-wrong.
**Can I use it in CI / a pre-commit hook?**
Yes — `unspook --scan ` exits with code `1` when anything is found.
**Why "unspook"?**
It un-spooks your text: removes the ghostly invisible characters. 👻
## Contributing
Contributions are very welcome! See [CONTRIBUTING.md](./CONTRIBUTING.md) and the
[Code of Conduct](./CODE_OF_CONDUCT.md). Adding a code point or a homoglyph
mapping? Include a test and a reference.
```bash
git clone https://github.com/didrod205/unspook.git
cd unspook
npm install
npm test # run the suite
npm run dev # run the web app locally
```
## 💖 Sponsor
unspook is free, MIT-licensed, and built in spare time. If it saved you from a
maddening invisible-character bug — or a security incident — please consider
supporting it:
- ⭐ **Star this repo** — free, and it genuinely helps others find it.
- 🍋 **[Sponsor via Lemon Squeezy](https://elab-studio.lemonsqueezy.com/checkout/buy/5d059b89-51d0-456b-b33a-ed56994f7010)** — one-time or recurring support.
**Where your support goes:** keeping the character database current with new
Unicode releases, expanding the homoglyph/confusables coverage, maintaining the
free hosted web app, adding integrations (VS Code extension, ESLint plugin,
pre-commit hook), and answering issues quickly.
## License
[MIT](./LICENSE) © unspook contributors