{"id":13579042,"url":"https://github.com/JuliaStrings/utf8proc","last_synced_at":"2025-04-05T20:32:55.511Z","repository":{"id":18671939,"uuid":"21880375","full_name":"JuliaStrings/utf8proc","owner":"JuliaStrings","description":"a clean C library for processing UTF-8 Unicode data","archived":false,"fork":false,"pushed_at":"2024-04-08T17:46:39.000Z","size":5456,"stargazers_count":975,"open_issues_count":31,"forks_count":131,"subscribers_count":51,"default_branch":"master","last_synced_at":"2024-04-13T08:56:03.055Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"http://juliastrings.github.io/utf8proc/","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JuliaStrings.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2014-07-16T00:13:46.000Z","updated_at":"2024-04-29T18:45:52.162Z","dependencies_parsed_at":"2023-11-26T03:20:18.312Z","dependency_job_id":"08a8d0a9-c58d-4201-bd2e-842b534a2ff3","html_url":"https://github.com/JuliaStrings/utf8proc","commit_stats":{"total_commits":261,"total_committers":43,"mean_commits":6.069767441860465,"dds":0.6896551724137931,"last_synced_commit":"1cb28a66ca79a0845e99433fd1056257456cef8b"},"previous_names":["julialang/utf8proc"],"tags_count":20,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuliaStrings%2Futf8proc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuliaStrings%2Futf8proc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuliaStrings%2Futf8proc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuliaStrings%2Futf8proc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JuliaStrings","download_url":"https://codeload.github.com/JuliaStrings/utf8proc/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247399818,"owners_count":20932875,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T15:01:36.010Z","updated_at":"2025-04-05T20:32:55.492Z","avatar_url":"https://github.com/JuliaStrings.png","language":"C","readme":"# utf8proc\n[![CI](https://github.com/NanoComp/meep/actions/workflows/build-ci.yml/badge.svg)](https://github.com/JuliaStrings/utf8proc/actions/workflows/build-ci.yml)\n[![AppVeyor status](https://ci.appveyor.com/api/projects/status/ivaa0v6ikxrmm5r6?svg=true)](https://ci.appveyor.com/project/StevenGJohnson/utf8proc)\n\n[utf8proc](http://juliastrings.github.io/utf8proc/) is a small, clean C\nlibrary that provides Unicode normalization, case-folding, and other\noperations for data in the [UTF-8\nencoding](http://en.wikipedia.org/wiki/UTF-8).  It was [initially\ndeveloped](http://www.public-software-group.org/utf8proc) by Jan\nBehrens and the rest of the [Public Software\nGroup](http://www.public-software-group.org/), who deserve *nearly all\nof the credit* for this package.  With the blessing of the Public\nSoftware Group, the [Julia developers](http://julialang.org/) have\ntaken over development of utf8proc, since the original developers have\nmoved to other projects.\n\n(utf8proc is used for basic Unicode\nsupport in the [Julia language](http://julialang.org/), and the Julia\ndevelopers became involved because they wanted to add Unicode 7 support and other features.)\n\n(The original utf8proc package also includes Ruby and PostgreSQL plug-ins.\nWe removed those from utf8proc in order to focus exclusively on the C\nlibrary.)\n\nThe utf8proc package is licensed under the\nfree/open-source [MIT \"expat\"\nlicense](http://opensource.org/licenses/MIT) (plus certain Unicode\ndata governed by the similarly permissive [Unicode data\nlicense](http://www.unicode.org/copyright.html#Exhibit1)); please see\nthe included `LICENSE.md` file for more detailed information.\n\n## Quick Start\n\nTypical users should download a [utf8proc release](http://juliastrings.github.io/utf8proc/releases/) rather than cloning directly from github.\n\nFor compilation of the C library, run `make`.  You can also install the library and header file with `make install` (by default into `/usr/local/lib` and `/usr/local/bin`, but this can be changed by `make prefix=/some/dir`).  `make check` runs some tests, and `make clean` deletes all of the generated files.\n\nAlternatively, you can compile with `cmake`, e.g. by\n```sh\nmkdir build\ncmake -S . -B build\ncmake --build build\n```\n\n### Using other compilers\nThe included `Makefile` supports GNU/Linux flavors and MacOS with `gcc`-like compilers; Windows users will typically use `cmake`.\n\nFor other Unix-like systems and other compilers, you may need to pass modified settings to `make` in order to use the correct compilation flags for building shared libraries on your system.\n\nFor HP-UX with HP's `aCC` compiler and GNU Make (installed as `gmake`), you can compile with\n```\ngmake CC=/opt/aCC/bin/aCC CFLAGS=\"+O2\" PICFLAG=\"+z\" C99FLAG=\"-Ae\" WCFLAGS=\"+w\" LDFLAG_SHARED=\"-b\" SOFLAG=\"-Wl,+h\"\n```\nTo run `gmake install` you will need GNU coreutils for the `install` command, and you may want to pass `prefix=/opt libdir=/opt/lib/hpux32` or similar to change the installation location.\n\n## General Information\n\nThe C library is found in this directory after successful compilation\nand is named `libutf8proc.a` (for the static library) and\n`libutf8proc.so` (for the dynamic library).\n\nThe Unicode version supported is 16.0.0.\n\nFor Unicode normalizations, the following options are used:\n\n* Normalization Form C:  `STABLE`, `COMPOSE`\n* Normalization Form D:  `STABLE`, `DECOMPOSE`\n* Normalization Form KC: `STABLE`, `COMPOSE`, `COMPAT`\n* Normalization Form KD: `STABLE`, `DECOMPOSE`, `COMPAT`\n\n## C Library\n\nThe documentation for the C library is found in the `utf8proc.h` header file.\n`utf8proc_map` is function you will most likely be using for mapping UTF-8\nstrings, unless you want to allocate memory yourself.\n\n## To Do\n\nSee the Github [issues list](https://github.com/JuliaLang/utf8proc/issues).\n\n## Contact\n\nBug reports, feature requests, and other queries can be filed at\nthe [utf8proc issues page on Github](https://github.com/JuliaLang/utf8proc/issues).\n\n## See also\n\nAn independent Lua translation of this library, [lua-mojibake](https://github.com/differentprogramming/lua-mojibake), is also available.\n\n## Examples\n\n### Convert codepoint to string\n```c\n// Convert codepoint `a` to utf8 string `str`\nutf8proc_int32_t a = 223;\nutf8proc_uint8_t str[16] = { 0 };\nutf8proc_encode_char(a, str);\nprintf(\"%s\\n\", str);\n// ß\n```\n\n### Convert string to codepoint\n```c\n// Convert string `str` to pointer to codepoint `a`\nutf8proc_uint8_t str[] = \"ß\";\nutf8proc_int32_t a;\nutf8proc_iterate(str, -1, \u0026a);\nprintf(\"%d\\n\", a);\n// 223\n```\n\n### Casefold\n\n```c\n// Convert \"ß\"  (U+00DF) to its casefold variant \"ss\"\nutf8proc_uint8_t str[] = \"ß\";\nutf8proc_uint8_t *fold_str;\nutf8proc_map(str, 0, \u0026fold_str, UTF8PROC_NULLTERM | UTF8PROC_CASEFOLD);\nprintf(\"%s\\n\", fold_str);\n// ss\nfree(fold_str);\n```\n\n### Normalization Form C/D (NFC/NFD)\n```c\n// Decompose \"\\u00e4\\u00f6\\u00fc\" = \"äöü\" into \"a\\u0308o\\u0308u\\u0308\" (= \"äöü\" via combining char U+0308)\nutf8proc_uint8_t input[] = {0xc3, 0xa4, 0xc3, 0xb6, 0xc3, 0xbc}; // \"\\u00e4\\u00f6\\u00fc\" = \"äöü\" in UTF-8\nutf8proc_uint8_t *nfd= utf8proc_NFD(input); // = {0x61, 0xcc, 0x88, 0x6f, 0xcc, 0x88, 0x75, 0xcc, 0x88}\n\n// Compose \"a\\u0308o\\u0308u\\u0308\" into \"\\u00e4\\u00f6\\u00fc\" (= \"äöü\" via precomposed characters)\nutf8proc_uint8_t *nfc= utf8proc_NFC(nfd);\n\nfree(nfd);\nfree(nfc);\n```\n","funding_links":[],"categories":["Internationalization","C","String Manipulation ##"],"sub_categories":["Web Frameworks ###"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJuliaStrings%2Futf8proc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FJuliaStrings%2Futf8proc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FJuliaStrings%2Futf8proc/lists"}