{"id":19674456,"url":"https://github.com/mirage/uuuu","last_synced_at":"2025-10-27T18:41:58.268Z","repository":{"id":45203191,"uuid":"140600584","full_name":"mirage/uuuu","owner":"mirage","description":null,"archived":false,"fork":false,"pushed_at":"2021-12-31T15:12:24.000Z","size":100,"stargazers_count":9,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-20T21:41:53.623Z","etag":null,"topics":["iso8859","utf8"],"latest_commit_sha":null,"homepage":null,"language":"OCaml","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mirage.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGES.md","contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-07-11T16:16:44.000Z","updated_at":"2024-11-04T23:55:40.000Z","dependencies_parsed_at":"2022-09-04T05:11:20.855Z","dependency_job_id":null,"html_url":"https://github.com/mirage/uuuu","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mirage%2Fuuuu","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mirage%2Fuuuu/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mirage%2Fuuuu/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mirage%2Fuuuu/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mirage","download_url":"https://codeload.github.com/mirage/uuuu/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251415735,"owners_count":21585882,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["iso8859","utf8"],"created_at":"2024-11-11T17:18:22.224Z","updated_at":"2025-10-27T18:41:53.208Z","avatar_url":"https://github.com/mirage.png","language":"OCaml","funding_links":[],"categories":[],"sub_categories":[],"readme":"# [Uuuu](https://www.youtube.com/watch?v=jjD9WzW6dK4)\n\nUhuhuhuhuhuh! `uuuu` (Universal Unifier to Unicode *Un* OCaml) is a little\nlibrary to normalize an ISO-8859 input to Unicode code-point. This library uses\ntables provided by the Unicode Consortium:\n\n[Unicode table](https://ftp.unicode.org/Public/MAPPINGS/ISO8859/)\n\nThis project takes tables and converts them to OCaml code. Then, it provides a\nnon-blocking *best-effort* decoder to translate ISO-8859 codepoint to UTF-8\ncodepoint.\n\n## How to use it?\n\n`uuuu` has an _dbuenzli_ interface. So it should be easy to use it and trick on\nit. `uuuu` has a simple goal, offer a general way to decode an ISO-8859 input\nand normalize it to unicode codepoints. We need to be able to control\nmemory-consumption and ensure to offer a non-blocking computation. Finally, an\nerror should not stop the process of the decoding.\n\nThis is a little example with [uutf][uutf] to translate a latin1 to UTF-8:\n\n```ocaml\nlet trans ic oc =\n  let decoder = Uuuu.decoder (Uuuu.encoding_of_string \"latin1\") (`Channel ic) in\n  let encoder = Uutf.encoder `UTF_8 (`Channel oc) in\n  let rec go () = match Uuuu.decode decoder with\n    | `Await -\u003e assert false (* XXX(dinosaure): impossible when you use `String of `Channel as source. *)\n    | `Uchar _ as uchar -\u003e ignore @@ Uutf.encode encoder uchar ; go ()\n    | `End -\u003e ignore @@ Uutf.encoder `End\n    | `Malformed err -\u003e failwith err in\n  go ()\n  \nlet () = trans stdin stdout\n```\n\n### About `encoding_of_string`\n\n`uuuu` follows aliases availables into IANA character sets database:\nhttps://www.iana.org/assignments/character-sets.xhtml\n\nOthers aliases will raise an exception. This function is case-insensitive.\n\n### About translation tables\n\n`uuuu` integrates translation tables provided by Unicode consortium. They should\nnot be updated - so we statically save then into an `int array`.\n\n### About encoding\n\n`uuuu` supports only decoding to Unicode code-point. A support of encoding is\nnot on our plan where people should only use Unicode now.\n\n### A larger decoder\n\n`uuuu` is a part of a biggest project [rosetta][rosetta] which is a decoder for\nsome others encodings. If you want to handle more encodings than ISO-8859, you\nshould look into this higher library.\n\n### Distribution\n\n`uuuu` integrates a little binary to translate ISO-8859 flow to UTF-8:\n`uuuu.to_utf8`. It is provided as an example of how to use `uuuu` with `uutf`.\n\n[uutf]: https://github.com/dbuenzli/uutf.git\n[rosetta]: https://github.com/mirage/rosetta.git\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmirage%2Fuuuu","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmirage%2Fuuuu","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmirage%2Fuuuu/lists"}