{"id":28626195,"url":"https://github.com/commoncrawl/ccf-git-github-filesystem-unicode-test","last_synced_at":"2025-10-27T01:04:31.215Z","repository":{"id":258706497,"uuid":"874451489","full_name":"commoncrawl/ccf-git-github-filesystem-unicode-test","owner":"commoncrawl","description":"Test files to diagnose git and filesystem problems with unicode normalization","archived":false,"fork":false,"pushed_at":"2024-10-18T00:28:00.000Z","size":4,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-10-20T08:05:21.588Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/commoncrawl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-17T21:11:47.000Z","updated_at":"2024-10-18T00:28:04.000Z","dependencies_parsed_at":"2024-10-20T08:05:26.179Z","dependency_job_id":"91ed2649-9f27-4a17-bfa8-a82f4c52f1b3","html_url":"https://github.com/commoncrawl/ccf-git-github-filesystem-unicode-test","commit_stats":null,"previous_names":["commoncrawl/ccf-git-github-filesystem-unicode-test"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/commoncrawl/ccf-git-github-filesystem-unicode-test","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/commoncrawl%2Fccf-git-github-filesystem-unicode-test","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/commoncrawl%2Fccf-git-github-filesystem-unicode-test/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/commoncrawl%2Fccf-git-github-filesystem-unicode-test/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/commoncrawl%2Fccf-git-github-filesystem-unicode-test/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/commoncrawl","download_url":"https://codeload.github.com/commoncrawl/ccf-git-github-filesystem-unicode-test/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/commoncrawl%2Fccf-git-github-filesystem-unicode-test/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259432321,"owners_count":22856726,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-12T08:41:08.603Z","updated_at":"2025-10-18T00:17:31.480Z","avatar_url":"https://github.com/commoncrawl.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ccf-git-github-filesystem-unicode-test\n\nIs your life not dangerous enough? Try this on for size.\n\n```\npython ./generate.py\n\ndoing string daatsʼíin:\nLATIN SMALL LETTER D, LATIN SMALL LETTER A, LATIN SMALL LETTER A, LATIN SMALL LETTER T, LATIN SMALL LETTER S, MODIFIER LETTER APOSTROPHE, LATIN SMALL LETTER I WITH ACUTE, LATIN SMALL LETTER I, LATIN SMALL LETTER N\noriginal string daatsʼíin appears to have already been in form NFC\noriginal string daatsʼíin appears to have already been in form NFKC\nin NFD: LATIN SMALL LETTER D, LATIN SMALL LETTER A, LATIN SMALL LETTER A, LATIN SMALL LETTER T, LATIN SMALL LETTER S, MODIFIER LETTER APOSTROPHE, LATIN SMALL LETTER I, COMBINING ACUTE ACCENT, LATIN SMALL LETTER I, LATIN SMALL LETTER N\nin NFKD: LATIN SMALL LETTER D, LATIN SMALL LETTER A, LATIN SMALL LETTER A, LATIN SMALL LETTER T, LATIN SMALL LETTER S, MODIFIER LETTER APOSTROPHE, LATIN SMALL LETTER I, COMBINING ACUTE ACCENT, LATIN SMALL LETTER I, LATIN SMALL LETTER N\nstring daatsʼíin has equivalent forms ['NFC', 'NFKC']\nstring daatsʼíin has equivalent forms ['NFD', 'NFKD']\npath NFC,NFKC-daatsʼíin already exists, not creating\npath NFD,NFKD-daatsʼíin already exists, not creating\n\ndoing string dũya:\nLATIN SMALL LETTER D, LATIN SMALL LETTER U WITH TILDE, LATIN SMALL LETTER Y, LATIN SMALL LETTER A\noriginal string dũya appears to have already been in form NFC\noriginal string dũya appears to have already been in form NFKC\nin NFD: LATIN SMALL LETTER D, LATIN SMALL LETTER U, COMBINING TILDE, LATIN SMALL LETTER Y, LATIN SMALL LETTER A\nin NFKD: LATIN SMALL LETTER D, LATIN SMALL LETTER U, COMBINING TILDE, LATIN SMALL LETTER Y, LATIN SMALL LETTER A\nstring dũya has equivalent forms ['NFC', 'NFKC']\nstring dũya has equivalent forms ['NFD', 'NFKD']\npath NFC,NFKC-dũya already exists, not creating\npath NFD,NFKD-dũya already exists, not creating\n```\n\nHaving these files, now try checking them in. Tar, untar, zip, unizp, knock\nyourself out.\n\nWe're looking for different behavior in different situations. Linux is usually\nclean, however, sometimes it can have problems.\n\n## MacOS\n\nafter git checkout:\n\n```\nUntracked files:\n  (use \"git add \u003cfile\u003e...\" to include in what will be committed)\n        \"NFD,NFKD-daats\\312\\274\\303\\255in\"\n        \"NFD,NFKD-d\\305\\251ya\"\n```\n\nAnd then\n\n```\n$ python3 generate.py\n\ndoing string daatsʼíin:\nLATIN SMALL LETTER D, LATIN SMALL LETTER A, LATIN SMALL LETTER A, LATIN SMALL LETTER T, LATIN SMALL LETTER S, MODIFIER LETTER APOSTROPHE, LATIN SMALL LETTER I WITH ACUTE, LATIN SMALL LETTER I, LATIN SMALL LETTER N\noriginal string daatsʼíin appears to have been in form NFC\noriginal string daatsʼíin appears to have been in form NFKC\nin NFD: LATIN SMALL LETTER D, LATIN SMALL LETTER A, LATIN SMALL LETTER A, LATIN SMALL LETTER T, LATIN SMALL LETTER S, MODIFIER LETTER APOSTROPHE, LATIN SMALL LETTER I, COMBINING ACUTE ACCENT, LATIN SMALL LETTER I, LATIN SMALL LETTER N\nin NFKD: LATIN SMALL LETTER D, LATIN SMALL LETTER A, LATIN SMALL LETTER A, LATIN SMALL LETTER T, LATIN SMALL LETTER S, MODIFIER LETTER APOSTROPHE, LATIN SMALL LETTER I, COMBINING ACUTE ACCENT, LATIN SMALL LETTER I, LATIN SMALL LETTER N\nstring daatsʼíin has equivalent forms ['NFC', 'NFKC']\nstring daatsʼíin has equivalent forms ['NFD', 'NFKD']\npath NFC,NFKC-daatsʼíin already exists, not creating\npath NFD,NFKD-daatsʼíin already exists, not creating\n\ndoing string dũya:\nLATIN SMALL LETTER D, LATIN SMALL LETTER U WITH TILDE, LATIN SMALL LETTER Y, LATIN SMALL LETTER A\nstring dũya appears to have been in form NFC\nstring dũya appears to have been in form NFKC\nin NFD: LATIN SMALL LETTER D, LATIN SMALL LETTER U, COMBINING TILDE, LATIN SMALL LETTER Y, LATIN SMALL LETTER A\nin NFKD: LATIN SMALL LETTER D, LATIN SMALL LETTER U, COMBINING TILDE, LATIN SMALL LETTER Y, LATIN SMALL LETTER A\nstring dũya has equivalent forms ['NFC', 'NFKC']\nstring dũya has equivalent forms ['NFD', 'NFKD']\npath NFC,NFKC-dũya already exists, not creating\npath NFD,NFKD-dũya already exists, not creating\n\n$ git status\nOn branch main\nYour branch is up to date with 'origin/main'.\n\nUntracked files:\n  (use \"git add \u003cfile\u003e...\" to include in what will be committed)\n        \"NFD,NFKD-daats\\312\\274\\303\\255in\"\n        \"NFD,NFKD-d\\305\\251ya\"\n\nnothing added to commit but untracked files present (use \"git add\" to track)\n```\n\nSo those 2 files exist, are untracked, and yet are not... missing.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcommoncrawl%2Fccf-git-github-filesystem-unicode-test","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcommoncrawl%2Fccf-git-github-filesystem-unicode-test","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcommoncrawl%2Fccf-git-github-filesystem-unicode-test/lists"}