{"id":22494199,"url":"https://github.com/fcevado/unidecode","last_synced_at":"2025-09-10T14:05:06.503Z","repository":{"id":57556490,"uuid":"93981286","full_name":"fcevado/unidecode","owner":"fcevado","description":"Elixir package to transliterate Unicode to ASCII","archived":false,"fork":false,"pushed_at":"2021-05-18T00:11:07.000Z","size":590,"stargazers_count":27,"open_issues_count":0,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-01-31T04:11:22.309Z","etag":null,"topics":["ascii","elixir","transliteration","unicode","unidecode"],"latest_commit_sha":null,"homepage":"https://hex.pm/packages/unidecode","language":"Elixir","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fcevado.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-06-11T04:20:46.000Z","updated_at":"2024-11-29T22:31:45.000Z","dependencies_parsed_at":"2022-09-14T12:22:04.868Z","dependency_job_id":null,"html_url":"https://github.com/fcevado/unidecode","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fcevado%2Funidecode","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fcevado%2Funidecode/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fcevado%2Funidecode/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fcevado%2Funidecode/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fcevado","download_url":"https://codeload.github.com/fcevado/unidecode/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241465057,"owners_count":19967243,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ascii","elixir","transliteration","unicode","unidecode"],"created_at":"2024-12-06T19:02:05.541Z","updated_at":"2025-03-02T14:26:13.262Z","avatar_url":"https://github.com/fcevado.png","language":"Elixir","funding_links":[],"categories":["Elixir"],"sub_categories":[],"readme":"# Unidecode\n\nAn elixir implementation of [Text::Unidecode](http://search.cpan.org/~sburke/Text-Unidecode-1.30/) a perl module to transliterate Unicode characters to US-ASCII.\n\nIt doesn't change encoding, as every string in Elixir, all results still are UTF8/Unicode characters.\nBut are they are easy to convert to ASCII. Let's say you have the word `código` that is the portuguese word for code, and try to convert it to a charlist.\n\n```elixir\niex\u003e to_charlist(\"código\")\n[99, 243, 100, 105, 103, 111]\n```\n\nUnicode is made to make this kind of operation give you better results.\n\n```elixir\niex\u003e \"código\" |\u003e Unidecode.decode |\u003e to_charlist\n'codigo'\n```\n\nThis isn't the exact characters, but is readable and intelligible to anyone who speaks portuguese.\n\n## Design Philosophy(taken from original Unidecode perl library)\n\nUnidecode's ability to transliterate from a given language is limited by two factors:\n\n- The amount and quality of data in the written form of the original language\n  So if you have Hebrew data that has no vowel points in it, then Unidecode cannot guess what vowels should appear in a pronunciation.\n  S f y hv n vwls n th npt, y wn't gt ny vwls n th tpt.\n  (This is a specific application of the general principle of \"Garbage In, Garbage Out\".)\n\n- Basic limitations in the Unidecode design\n  Writing a real and clever transliteration algorithm for any single language usually requires a lot of time, and at least a passable knowledge of the language involved.\n  But Unicode text can convey more languages than I could possibly learn (much less create a transliterator for) in the entire rest of my lifetime.\n  So I put a cap on how intelligent Unidecode could be, by insisting that it support only context-insensitive transliteration.\n  That means missing the finer details of any given writing system, while still hopefully being useful.\n\nUnidecode, in other words, is quick and dirty.\nSometimes the output is not so dirty at all: Russian and Greek seem to work passably; and while Thaana (Divehi, AKA Maldivian) is a definitely non-Western writing system, setting up a mapping from it to Roman letters seems to work pretty well.\nBut sometimes the output is very dirty: Unidecode does quite badly on Japanese and Thai.\n\nIf you want a smarter transliteration for a particular language than Unidecode provides, then you should look for (or write) a transliteration algorithm specific to that language, and apply it instead of (or at least before) applying Unidecode.\n\nIn other words, Unidecode's approach is broad (knowing about dozens of writing systems), but shallow (not being meticulous about any of them).\n\n## Installation\n\nAdd unidecode to your depencies\n\n```elixir\ndef deps do\n  [{:unidecode, \"~\u003e 1.0.0\"}]\nend\n```\n\n## [Changelog](./CHANGELOG.md)\n\n## [Code of Conduct](./CODE_OF_CONDUCT.md)\n\n## [License](./LICENSE)\n\nUnidecode is under Apache v2.0 license.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffcevado%2Funidecode","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffcevado%2Funidecode","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffcevado%2Funidecode/lists"}