{"id":16866741,"url":"https://github.com/b4n/wtf8tools","last_synced_at":"2025-08-02T13:09:38.751Z","repository":{"id":142098236,"uuid":"68630139","full_name":"b4n/wtf8tools","owner":"b4n","description":"WTF-8 conversion tools","archived":false,"fork":false,"pushed_at":"2016-09-19T18:12:32.000Z","size":19,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-18T17:58:05.187Z","etag":null,"topics":["encoding-convertors","unicode","utf-16","utf-32","utf-8","wtf-8"],"latest_commit_sha":null,"homepage":null,"language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/b4n.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-09-19T17:38:22.000Z","updated_at":"2017-09-18T03:50:59.000Z","dependencies_parsed_at":null,"dependency_job_id":"2e1530b8-fc21-49ab-b5a2-92a879a96a4d","html_url":"https://github.com/b4n/wtf8tools","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/b4n/wtf8tools","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/b4n%2Fwtf8tools","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/b4n%2Fwtf8tools/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/b4n%2Fwtf8tools/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/b4n%2Fwtf8tools/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/b4n","download_url":"https://codeload.github.com/b4n/wtf8tools/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/b4n%2Fwtf8tools/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268393943,"owners_count":24243320,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-02T02:00:12.353Z","response_time":74,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["encoding-convertors","unicode","utf-16","utf-32","utf-8","wtf-8"],"created_at":"2024-10-13T14:51:33.307Z","updated_at":"2025-08-02T13:09:38.725Z","avatar_url":"https://github.com/b4n.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# WTF-8 conversion tools\n\nA set of naive tools to convert between broken UTF-16 and WTF-8.\nSee https://en.wikipedia.org/wiki/UTF-8#WTF-8\n\nThe only purpose of these tools is to convert to and from broken UTF-16 (that\nis, with unpaired surrogates), which Windows seem to happily generate.\n\nBasically, all it does is happily read or write unpaired surrogate halves.\n\n## (Broken) UTF-16 to WTF-8\n\n`wtf162wtf8` reads UTF-16 code units, and tries to read code points.  If that\nsucceeds, write the read code point as UTF-8.  If it doesn't succeed, i.e. if\nit is a high or low surrogate without its other half, write the surrogate half\nas UTF-8 (which makes it WTF-8).\n\nThe result is WTF-8, and even UTF-8 if the input is valid UTF-16.\n\n## WTF-8 to (broken) UTF-16\n\n`wtf82utf16` does the revers conversion: given WTF-8 input, it reconstructs\nthe possibly broken UTF-16 data.  All it does is actually write every code\npoints below `0x10000` as plain UTF-16 units, even surrogate halves.\n\n## UTF-32 support\n\nAs a proof of concept, there is also support for broken UTF-32.  Just like\nWTF-8 and broken UTF-16, is allows reserved code points to appear and encodes\nand decodes them happily.  Only WTF-8/UTF-32 pairs are provided, but they can\nbe streamed together to convert directly between UTF-16 and UTF-32, using e.g.\n`wtf162wtf8 \u003c input | wtf82utf32 \u003e output`.\n\n## Regarding Endianess\n\nThese tools are naive, and don't actually do anything about endianess.  The\nresult is that if they are run on a Big Endian machine, they read and write\nUTF-16BE, and if they are run on a Little Endian machine (fairly more common),\nthey read and write UTF-16LE.\n\nAs those tools are typically useful with UTF-16LE, and most machines are\nLittle Endian, it should generally work fine.  Hopefully.\n\n## Usage\n\nTo convert from (broken) UTF-16 to WTF-8, use `wtf162wtf8 \u003c input \u003e output`.\nSimilarly, to convert from WTF-8 to (broken) UTF-16, use\n`wtf82utf16 \u003c input \u003e output`.\n\nYou can control the verbosity through the `VERBOSE` environment variable: set\nit to a positive integer to get verbose/debugging output on `stderr`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fb4n%2Fwtf8tools","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fb4n%2Fwtf8tools","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fb4n%2Fwtf8tools/lists"}