Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jacopodl/dirtytext
Searches for [ab]using of Unicode glyphs.
https://github.com/jacopodl/dirtytext
dirty glyphs text tool unicode utf-8
Last synced: 18 days ago
JSON representation
Searches for [ab]using of Unicode glyphs.
- Host: GitHub
- URL: https://github.com/jacopodl/dirtytext
- Owner: jacopodl
- License: gpl-3.0
- Created: 2018-05-29T12:31:52.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2018-06-25T13:37:42.000Z (over 6 years ago)
- Last Synced: 2024-11-11T17:29:38.428Z (about 1 month ago)
- Topics: dirty, glyphs, text, tool, unicode, utf-8
- Language: Python
- Homepage:
- Size: 141 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# DirtyText #
Searches for [ab]using of Unicode glyphs.## Installation
DirtyText package can be installed through pip :snake: :$ pip install dirtytext
or downloaded from [GitHub](https://github.com/jacopodl/dirtytext).
# Quick tour: #
## Common options: ##
- Read from file: -f \
- Save modified text: -s \
- Text filter: --filter
- Pipeline mode: -p### :mag_right: Looks for ZERO-WIDTH characters: ###
$> echo "This text contains zero-width chars" | dirtytext --zero -vwill produce the following output:
```text
Contains zero-width characters: True
JSON:
[{"idx": 0, "char": "\ufeff", "cval": "FEFF", "infos": null},
{"idx": 10, "char": "\u200c", "cval": "200C", "infos": null},
{"idx": 11, "char": "\u200c", "cval": "200C", "infos": null}, ...]
```### :mag_right: Looks for CONFUSABLES characters: ###
$> echo "hello" | dirtytext --confusables greek -v
will produce the following output:
```text
Contains confusables characters: True
JSON:
[{"idx": 2, "char": "l", "cval": "006C", "infos": [{"target": "0399", "description": "GREEK CAPITAL LETTER IOTA"}]},
{"idx": 3, "char": "l", "cval": "006C", "infos": [{"target": "0399", "description": "GREEK CAPITAL LETTER IOTA"}]},
{"idx": 4, "char": "o", "cval": "006F", "infos": [{"target": "03BF", "description": "GREEK SMALL LETTER OMICRON"},
{"target": "03C3", "description": "GREEK SMALL LETTER SIGMA"}]}]
```### :mag_right: Looks and filter anomalies in LATIN text: ###
```text
example.txt:It ⅽan be argueⅾ that the ⅽomputer ⅰs humanⅰty’s attempt to repⅼⅰⅽate the human brain.
This ⅰs perhaps an unattainable goal.
However, unattainable goals often lead to outstanding accomplishment.
```
$> dirtytext -f example.txt --lsubs --filter -s out.txt```text
out.txt:It can be argued that the computer is humanity’s attempt to replicate the human brain.
This is perhaps an unattainable goal.
However, unattainable goals often lead to outstanding accomplishment.
```# UnicodeDB #
The unicode data that composes dirtytext database are extracted from unicode consortium,
in particular there are two database files into dirtytext/data directory:- categories.json: built from data extracted from [here](https://unicode.org/Public/UNIDATA/Scripts.txt)
- confusables.json: built from data extracted from [here](https://unicode.org/Public/security/latest/confusables.txt)If dirtytext/data doesn't exist, DT downloads and build database before performing the required operations,
after which you can force the database update by adding the --update option# License #
Released under GPL-3.0