{"id":39462102,"url":"https://github.com/efmsoft/utf8","last_synced_at":"2026-01-18T04:45:08.555Z","repository":{"id":177496014,"uuid":"597512751","full_name":"efmsoft/utf8","owner":"efmsoft","description":"This library contains a set of classes for working with strings in utf8 format, as well as functions for converting strings in utf8, ANSI, utf16, utf32 formats.  The most commonly used format conversion operations are converting from ANSI encoding (on Windows), as well as from a Unicode string","archived":false,"fork":false,"pushed_at":"2024-08-05T11:44:37.000Z","size":199,"stargazers_count":3,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-08-05T13:31:38.762Z","etag":null,"topics":["ansi","conversion","utf-16","utf-32","utf-8","utf8"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/efmsoft.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-02-04T19:24:34.000Z","updated_at":"2024-08-05T11:44:41.000Z","dependencies_parsed_at":null,"dependency_job_id":"d6ed0beb-bd52-476a-82aa-7b8d5c878b7f","html_url":"https://github.com/efmsoft/utf8","commit_stats":null,"previous_names":["efmsoft/utf8"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/efmsoft/utf8","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/efmsoft%2Futf8","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/efmsoft%2Futf8/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/efmsoft%2Futf8/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/efmsoft%2Futf8/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/efmsoft","download_url":"https://codeload.github.com/efmsoft/utf8/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/efmsoft%2Futf8/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28530125,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-18T00:39:45.795Z","status":"online","status_checked_at":"2026-01-18T02:00:07.578Z","response_time":98,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ansi","conversion","utf-16","utf-32","utf-8","utf8"],"created_at":"2026-01-18T04:45:03.928Z","updated_at":"2026-01-18T04:45:08.536Z","avatar_url":"https://github.com/efmsoft.png","language":"C++","readme":"# utf8 library\n\nThis library contains a set of classes for working with strings in **UTF-8** format, as well as functions for converting strings in **utf8**, **ANSI**, **utf16**, **utf32** formats.\n\n# Table of Contents\n1. [Conversion from/to UTF8](#conversion-from-to-utf8)\n2. [utf8::String class](#utf8string-class)\n3. [utf8::Char class](#utf8char-class)\n4. [Utf8Ptr and AnsiPtr Classes](#utf8ptr-and-ansiptr-classes)\n\n## Conversion from/to UTF8\nThe most commonly used format conversion operations are converting from **ANSI** encoding (on Windows), as well as from a **Unicode** string (on Windows it is a **utf-16** encoded string; on Posix systems it is **utf-32**). This library allows to solve these problems by calling one of the conversion functions:\n```cpp\nconst wchar_t* unicode_str = L\"тЕкст1 王明 Mötley Crüe\";\n\nstd::string utf8_str = utf8::WstringToUtf8(unicode_str);\nstd::wstring unicode_str2 = utf8::Utf8ToWstring(utf8_str.c_str());\nassert(wcscmp(unicode_str, unicode_str2.c_str()) == 0);\n\n#ifdef _WIN32\n WIN32_FIND_DATAA fd;\n HANDLE h = FindFirstFileA(mask, \u0026fd);\n if (h != INVALID_HANDLE_VALUE)\n {\n   std::string u8name = utf8::AnsiToUtf8(fd.cFileName);\n   FindClose(h);\n }\n#endif\n```\n## utf8::String class\nIn a **utf-8** string, a character can be encoded with one, two, or three bytes (https://en.wikipedia.org/wiki/UTF-8). Thus, in general, the length of a string in characters and the length of a string in bytes are different values. Therefore, STL classes such as std::string are not suitable for a number of operations (for example, searching and extracting substrings). This library offers the **utf8::String** class for working with utf8 strings. This class is similar to **std::string** in many ways, but correctly implements all operations on working with a string in utf8\n\n```cpp\nutf8::String u8str(u8\"Абв\");\nprintf(\"Number of characters: %zu\\n\", u8str.length());  // 3 characters\nprintf(\"Size in bytes: %zu\\n\", u8str.size()); // 6 bytes\n\nutf8::Char ch(L'Ж');\nu8str.ReplaceAt(1, ch); // now u8str contains \"АЖв\"\n\n```\nOne of the problems when working with characters in a national encoding is the conversion of the character case. The utf8 library implements these methods for both **Windows** and **Linux**. Thus, the **ToLowerCase** and **ToUpperCase** methods of the utf8::String class are able to correctly perform such a conversion and do not require changes in С **locale**.\n\n## utf8::Char class\n**Utf-8** characters can occupy more than one byte. Thus, the built-in C **char** type is not fully suitable for storing utf8 characters. Since some of the **utf8::String** class methods accept or return a single character, the library defines a special type for storing a single character - **utf8::Char**.\n\n## Utf8Ptr and AnsiPtr Classes\nAs is known, a character is encoded by 8 bits in several encodings at once. This is the **Utf8** encoding, and the **ANSI** and **latin** encoding. In the C code, all this corresponds to __const char*__. In order to be able to distinguish these types when passing a pointer to a string to the methods of the **utf8::String** class, the helper classes **Utf8Ptr** and **AnsiPtr** are introduced.\n\n```cpp\nutf8::String u8str(AnsiPtr(\"текст в кодировке ANSI\"));\n```\nUnderstanding the type of the string received as input, the **utf8::String** class performs the appropriate conversion\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fefmsoft%2Futf8","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fefmsoft%2Futf8","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fefmsoft%2Futf8/lists"}