{"id":20812224,"url":"https://github.com/curegit/unicodecheck","last_synced_at":"2025-10-11T07:31:35.646Z","repository":{"id":227182101,"uuid":"707458253","full_name":"curegit/unicodecheck","owner":"curegit","description":"Simple tool to check if Unicode text files are Unicode-normalized","archived":false,"fork":false,"pushed_at":"2024-10-26T08:03:31.000Z","size":53,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-01-23T02:37:04.345Z","etag":null,"topics":["character-encoding","text-normalization","unicode"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/unicodecheck/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/curegit.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-10-20T00:22:01.000Z","updated_at":"2024-10-26T08:00:49.000Z","dependencies_parsed_at":"2024-04-09T07:28:14.884Z","dependency_job_id":"77c5d740-11da-4b00-986d-539e16690fdc","html_url":"https://github.com/curegit/unicodecheck","commit_stats":null,"previous_names":["curegit/unicodecheck"],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/curegit%2Funicodecheck","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/curegit%2Funicodecheck/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/curegit%2Funicodecheck/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/curegit%2Funicodecheck/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/curegit","download_url":"https://codeload.github.com/curegit/unicodecheck/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":236062481,"owners_count":19088983,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["character-encoding","text-normalization","unicode"],"created_at":"2024-11-17T20:51:20.239Z","updated_at":"2025-10-11T07:31:30.339Z","avatar_url":"https://github.com/curegit.png","language":"Python","readme":"# Unicodecheck\n\nSimple tool to check if Unicode text files are Unicode-normalized\n\n## Install\n\n```sh\npip3 install unicodecheck\n```\n\n## Usage\n\n### Quickstart\n\n```sh\nunicodecheck -iv SPAM.txt\n```\n\nTo check files in a directory recursively:\n\n```sh\nunicodecheck -ivr Ham/Eggs/\n```\n\n### Synopsis\n\nThe main program can be invoked either through the `unicodecheck` command or through the Python main module option `python3 -m unicodecheck`.\n\n```txt\nusage: unicodecheck [-h] [-V] [-m {NFC,NFD,NFKC,NFKD}] [-d] [-u [NUMBER]] [-r] [-i] [-v]\n                    PATH [PATH ...]\n```\n\n### Options\n\n```txt\npositional arguments:\n  PATH                  describe input file or directory (pass '-' to specify stdin)\n\noptions:\n  -h, --help            show this help message and exit\n  -V, --version         show program's version number and exit\n  -m {NFC,NFD,NFKC,NFKD}, --mode {NFC,NFD,NFKC,NFKD}\n                        target Unicode normalization (default: NFC)\n  -d, --diff            show diffs between the original and normalized (default: False)\n  -u [NUMBER], -U [NUMBER], --unified [NUMBER]\n                        show unified diffs with NUMBER lines of context [NUMBER=3] (default: False)\n  -r, --recursive       follow the directory tree rooted in each PATH argument (default: False)\n  -i, --include-hidden  include hidden files and directories (default: False)\n  -b PATTERN [PATTERN ...], --blacklist PATTERN [PATTERN ...]\n                        notify if having PATTERN (case-sensitive) (default: None)\n  -e, --error           return non-zero exit code on detection (default: False)\n  -v, --verbose         report non-essential logs (default: False)\n```\n\n## Tips\n\n### Check whether filenames are normalized\n\nThe `convmv` command is a good alternative to using this application.\n\n#### NFC\n\n```sh\nconvmv -f utf8 -t utf8 --nfc -r ./\n```\n\n#### NFD\n\n```sh\nconvmv -f utf8 -t utf8 --nfd -r ./\n```\n\n## Notes\n\n- This tool doesn't provide auto in-place (write) file normalization because Unicode normalization doesn't guarantee content equivalence.\n- The procedure for determining the binary file refers to Git's algorithm.\n\n## License\n\nMIT\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcuregit%2Funicodecheck","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcuregit%2Funicodecheck","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcuregit%2Funicodecheck/lists"}