Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/curegit/unicodecheck
Simple tool to check if Unicode text files are Unicode-normalized
https://github.com/curegit/unicodecheck
character-encoding text-normalization unicode
Last synced: about 5 hours ago
JSON representation
Simple tool to check if Unicode text files are Unicode-normalized
- Host: GitHub
- URL: https://github.com/curegit/unicodecheck
- Owner: curegit
- License: mit
- Created: 2023-10-20T00:22:01.000Z (about 1 year ago)
- Default Branch: master
- Last Pushed: 2024-05-31T16:26:11.000Z (6 months ago)
- Last Synced: 2024-10-05T03:49:29.648Z (about 1 month ago)
- Topics: character-encoding, text-normalization, unicode
- Language: Python
- Homepage: https://pypi.org/project/unicodecheck/
- Size: 45.9 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Unicodecheck
Simple tool to check if Unicode text files are Unicode-normalized
## Install
```sh
pip3 install unicodecheck
```## Usage
### Quickstart
```sh
unicodecheck -iv SPAM.txt
```To check files in a directory recursively:
```sh
unicodecheck -ivr Ham/Eggs/
```### Synopsis
The main program can be invoked either through the `unicodecheck` command or through the Python main module option `python3 -m unicodecheck`.
```txt
usage: unicodecheck [-h] [-V] [-m {NFC,NFD,NFKC,NFKD}] [-d] [-u [NUMBER]] [-r] [-i] [-v]
PATH [PATH ...]
```### Options
```txt
positional arguments:
PATH describe input file or directory (pass '-' to specify stdin)options:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-m {NFC,NFD,NFKC,NFKD}, --mode {NFC,NFD,NFKC,NFKD}
target Unicode normalization (default: NFC)
-d, --diff show diffs between the original and normalized (default: False)
-u [NUMBER], -U [NUMBER], --unified [NUMBER]
show unified diffs with NUMBER lines of context [NUMBER=3] (default: False)
-r, --recursive follow the directory tree rooted in each PATH argument (default: False)
-i, --include-hidden include hidden files and directories (default: False)
-b PATTERN [PATTERN ...], --blacklist PATTERN [PATTERN ...]
notify if having PATTERN (case-sensitive) (default: None)
-e, --error return non-zero exit code on detection (default: False)
-v, --verbose report non-essential logs (default: False)
```## Tips
### Check whether filenames are normalized
The `convmv` command is a good alternative to using this application.
#### NFC
```sh
convmv -f utf8 -t utf8 --nfc -r ./
```#### NFD
```sh
convmv -f utf8 -t utf8 --nfd -r ./
```## Notes
- This tool doesn't provide auto in-place (write) file normalization because Unicode normalization doesn't guarantee content equivalence.
- The procedure for determining the binary file refers to Git's algorithm.## License
MIT