{"id":15492245,"url":"https://github.com/danielaparker/unicode_traits","last_synced_at":"2025-04-22T19:26:02.193Z","repository":{"id":142025960,"uuid":"78046557","full_name":"danielaparker/unicode_traits","owner":"danielaparker","description":"The C++ unicode_traits class template makes using unicode easier","archived":false,"fork":false,"pushed_at":"2020-08-06T02:04:17.000Z","size":310,"stargazers_count":8,"open_issues_count":0,"forks_count":2,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-29T18:11:21.306Z","etag":null,"topics":["cpp11","unicode","unicode-traits","utf-16","utf-32","utf-8"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/danielaparker.png","metadata":{"files":{"readme":"README.md","changelog":"ChangeLog.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-01-04T19:34:08.000Z","updated_at":"2024-10-06T03:29:34.000Z","dependencies_parsed_at":null,"dependency_job_id":"49ecd35f-1c26-4339-8093-8e61eec8f5ef","html_url":"https://github.com/danielaparker/unicode_traits","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danielaparker%2Funicode_traits","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danielaparker%2Funicode_traits/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danielaparker%2Funicode_traits/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danielaparker%2Funicode_traits/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/danielaparker","download_url":"https://codeload.github.com/danielaparker/unicode_traits/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250307551,"owners_count":21409097,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cpp11","unicode","unicode-traits","utf-16","utf-32","utf-8"],"created_at":"2024-10-02T07:59:45.922Z","updated_at":"2025-04-22T19:26:02.184Z","avatar_url":"https://github.com/danielaparker.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# unicode_traits for C++\n\nThe C++ unicode_traits class template makes using unicode easier. \n\nAll you need to do is download one header file, [unicode_traits.hpp](https://raw.githubusercontent.com/danielaparker/unicode_traits/master/include/unicode_traits.hpp), and drop it somewhere in your include path.\n\nConsult the [unicode_traits reference](./doc/ref/index.md) for details.\n\n## Examples\n\nIn the examples below, the user's intentions for source and target encoding schemes are deduced from the character width, UTF-8 from 8 bit characters, UTF-16 from 16 bit characters, and UTF-32 from 32 bit characters. The character type may be any integral type, signed or unsigned, with size in bits of 8, 16 or 32.\n\n### Convert UTF8 to UTF16 and UTF32\n\n```c++\n#include \"unicode_traits.hpp\"\n#include \u003cvector\u003e\n#include \u003cstring\u003e\n#include \u003citerator\u003e\n\nint main()\n{\n    std::string source = \"Hello world \\xf0\\x9f\\x99\\x82\";  \n\n    // Convert source to UTF16\n    std::u16string target1;\n    auto result1 = unicons::convert(source.begin(),source.end(),\n                                    std::back_inserter(target1), \n                                    unicons::conv_flags::strict);\n\n    // Convert source to UTF32\n    std::vector\u003cuint32_t\u003e target2;\n    auto result2 = unicons::convert(source.begin(),source.end(),\n                                    std::back_inserter(target2), \n                                    unicons::conv_flags::strict);\n\n    // Convert source to UTF16 (if 16 bit wchar_t) or UTF32 (if 32 bit wchar_t)\n    wstring target3;\n    auto result3 = unicons::convert(source.begin(),source.end(),\n                                    std::back_inserter(target3), \n                                    unicons::conv_flags::strict);\n}\n```\nHello World \u0026#128578;\n\n### Append codepoint to string\n```c++\nuint32_t cp = 0x1f642;\n\nstd::string target1 = \"Hello world \";\nstd::u16string target2 = u\"Hello world \";\nstd::u32string target3 = U\"Hello world \";\nstd::wstring target4 = L\"Hello world \";\n\nauto result1 = unicons::convert(\u0026cp,\u0026cp + 1,std::back_inserter(target1), \n                                unicons::conv_flags::strict);\nauto result2 = unicons::convert(\u0026cp,\u0026cp + 1,std::back_inserter(target2), \n                                unicons::conv_flags::strict);\nauto result3 = unicons::convert(\u0026cp,\u0026cp + 1,std::back_inserter(target3), \n                                unicons::conv_flags::strict);\nauto result4 = unicons::convert(\u0026cp,\u0026cp + 1,std::back_inserter(target4), \n                                unicons::conv_flags::strict);\n```\nHello World \u0026#128578;\n\n### Codepoint iterator (without exceptions)\n\n```c++\nstd::string source = \"Hi \\xf0\\x9f\\x99\\x82\"; // U+1F642\n\nstd::error_code ec;\nauto it = unicons::make_codepoint_iterator(source.begin(),source.end(),ec);\nauto last = end(it);\n\nwhile (!ec \u0026\u0026 it != last)\n{\n    uint32_t codepoint = *it;\n    it.increment(ec);\n}\n```\n\nH   \ni   \n\n\u0026#128578;\n\n### Validate UTF-8 sequence\n\n```c++\nstd::string source = \"\\xE6\\x97\\xA5\\xD1\\x88\\xFA\";\nauto result = unicons::validate(source.begin(),source.end());\n\nif (result.ec)\n{\n    std::cout \u003c\u003c make_error_code(result.ec).message() \u003c\u003c std::endl;\n}\n```\nOutput:\n```\nPartial character in source, but hit end\n```\n\n### Validate UTF-16 sequence\n```c++\nstd::u16string source = u\"\\xD888\\x1234\";\nauto result = unicons::validate(source.begin(),source.end());\n\nif (result.ec)\n{\n    std::cout \u003c\u003c make_error_code(result.ec).message() \u003c\u003c std::endl;\n}\n```\nOutput:\n```\nUnpaired high surrogate UTF-16\n```\n\n## Supported compilers\n\n`unicode_traits` requires a C++11 compiler. It is tested in continuous integration on [AppVeyor](https://ci.appveyor.com/project/danielaparker/`unicode_traits`), [Travis](https://travis-ci.org/danielaparker/`unicode_traits`), and [doozer](https://doozer.io/).\n[UndefinedBehaviorSanitizer (UBSan)](http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html) diagnostics are enabled for selected gcc and clang builds.\n\n| Compiler                | Version                   |Architecture | Operating System  |\n|-------------------------|---------------------------|-------------|-------------------|\n| Microsoft Visual Studio | vs2015 (MSVC 19.0.24241.7)| x86,x64     | Windows 10        |\n|                         | vs2017                    | x86,x64     | Windows 10        |\n|                         | vs2019                    | x86,x64     | Windows 10        |\n| g++                     | 4.8 and above             | x64         | Ubuntu            |\n|                         | 4.8.5                     | x64         | CentOS 7.6        |\n|                         | 6.3.1 (Red Hat 6.3.1-1)   | x64         | Fedora release 24 |\n|                         | 4.9.2                     | i386        | Debian 8          |\n| clang                   | 3.8 and above             | x64         | Ubuntu            |\n| clang xcode             | 6.4 and above             | x64         | OSX               |\n\n## Resources\n\n- [The Unicode Consortium](http://unicode.org/)\n- [UTF-8 encoding table and Unicode characters](http://www.utf8-chartable.de/unicode-utf8-table.pl)\n- [Unicode code converter](https://r12a.github.io/apps/conversion/)\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanielaparker%2Funicode_traits","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdanielaparker%2Funicode_traits","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanielaparker%2Funicode_traits/lists"}