{"id":13424712,"url":"https://github.com/yhirose/cpp-unicodelib","last_synced_at":"2025-04-05T15:05:33.705Z","repository":{"id":15223549,"uuid":"51121249","full_name":"yhirose/cpp-unicodelib","owner":"yhirose","description":"A C++17 header-only Unicode library. (Unicode 16.0.0)","archived":false,"fork":false,"pushed_at":"2024-10-13T13:46:42.000Z","size":4647,"stargazers_count":111,"open_issues_count":0,"forks_count":19,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-03-29T14:06:34.794Z","etag":null,"topics":["cpp","cpp17","header-only","unicode"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yhirose.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-02-05T02:34:03.000Z","updated_at":"2025-03-26T15:10:12.000Z","dependencies_parsed_at":"2024-01-06T14:49:41.822Z","dependency_job_id":"363ce345-7d4b-45a7-8734-749167348424","html_url":"https://github.com/yhirose/cpp-unicodelib","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yhirose%2Fcpp-unicodelib","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yhirose%2Fcpp-unicodelib/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yhirose%2Fcpp-unicodelib/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yhirose%2Fcpp-unicodelib/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yhirose","download_url":"https://codeload.github.com/yhirose/cpp-unicodelib/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247353731,"owners_count":20925329,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cpp","cpp17","header-only","unicode"],"created_at":"2024-07-31T00:00:58.218Z","updated_at":"2025-04-05T15:05:33.685Z","avatar_url":"https://github.com/yhirose.png","language":"C++","readme":"cpp-unicodelib\n==============\n\n[![](https://github.com/yhirose/cpp-unicodelib/workflows/CMake/badge.svg)](https://github.com/yhirose/cpp-unicodelib/actions)\n\nA C++17 single-file header-only Unicode library. (Unicode 16.0.0)\n\nAPI\n---\n\n## Functions\n\n### Unicode Property\n\n#### General Category\n\n```cpp\nGeneralCategory general_category(char32_t cp);\n\nbool is_cased_letter_category(GeneralCategory gc);\nbool is_letter_category(GeneralCategory gc);\nbool is_mark_category(GeneralCategory gc);\nbool is_number_category(GeneralCategory gc);\nbool is_punctuation_category(GeneralCategory gc);\nbool is_symbol_category(GeneralCategory gc);\nbool is_separator_category(GeneralCategory gc);\nbool is_other_category(GeneralCategory gc);\n\nbool is_cased_letter(char32_t cp);\nbool is_letter(char32_t cp);\nbool is_mark(char32_t cp);\nbool is_number(char32_t cp);\nbool is_punctuation(char32_t cp);\nbool is_symbol(char32_t cp);\nbool is_separator(char32_t cp);\nbool is_other(char32_t cp);\n```\n\n#### Property\n\n```cpp\nbool is_white_space(char32_t cp);\nbool is_bidi_control(char32_t cp);\nbool is_join_control(char32_t cp);\nbool is_dash(char32_t cp);\nbool is_hyphen(char32_t cp);\nbool is_quotation_mark(char32_t cp);\nbool is_terminal_punctuation(char32_t cp);\nbool is_other_math(char32_t cp);\nbool is_hex_digit(char32_t cp);\nbool is_ascii_hex_digit(char32_t cp);\nbool is_other_alphabetic(char32_t cp);\nbool is_ideographic(char32_t cp);\nbool is_diacritic(char32_t cp);\nbool is_extender(char32_t cp);\nbool is_other_lowercase(char32_t cp);\nbool is_other_uppercase(char32_t cp);\nbool is_noncharacter_code_point(char32_t cp);\nbool is_other_grapheme_extend(char32_t cp);\nbool is_ids_binary_operator(char32_t cp);\nbool is_radical(char32_t cp);\nbool is_unified_ideograph(char32_t cp);\nbool is_other_default_ignorable_code_point(char32_t cp);\nbool is_deprecated(char32_t cp);\nbool is_soft_dotted(char32_t cp);\nbool is_logical_order_exception(char32_t cp);\nbool is_other_id_start(char32_t cp);\nbool is_other_id_continue(char32_t cp);\nbool is_sterm(char32_t cp);\nbool is_variation_selector(char32_t cp);\nbool is_pattern_white_space(char32_t cp);\nbool is_pattern_syntax(char32_t cp);\n```\n\n#### Derived Property\n\n```cpp\nbool is_math(char32_t cp);\nbool is_alphabetic(char32_t cp);\nbool is_lowercase(char32_t cp);\nbool is_uppercase(char32_t cp);\nbool is_cased(char32_t cp);\nbool is_case_ignorable(char32_t cp);\nbool is_changes_when_lowercased(char32_t cp);\nbool is_changes_when_uppercased(char32_t cp);\nbool is_changes_when_titlecased(char32_t cp);\nbool is_changes_when_casefolded(char32_t cp);\nbool is_changes_when_casemapped(char32_t cp);\nbool is_id_start(char32_t cp);\nbool is_id_continue(char32_t cp);\nbool is_xid_start(char32_t cp);\nbool is_xid_continue(char32_t cp);\nbool is_default_ignorable_code_point(char32_t cp);\nbool is_grapheme_extend(char32_t cp);\nbool is_grapheme_base(char32_t cp);\nbool is_grapheme_link(char32_t cp);\nbool is_indic_conjunct_break_linker(char32_t cp);\nbool is_indic_conjunct_break_consonant(char32_t cp);\nbool is_indic_conjunct_break_extend(char32_t cp);\n```\n\n### Case\n\n```cpp\nchar32_t simple_uppercase_mapping(char32_t cp);\nchar32_t simple_lowercase_mapping(char32_t cp);\nchar32_t simple_titlecase_mapping(char32_t cp);\nchar32_t simple_case_folding(char32_t cp);\n\nstd::u32string to_uppercase(const char32_t *s32, size_t l, const char *lang = nullptr);\nstd::u32string to_lowercase(const char32_t *s32, size_t l, const char *lang = nullptr);\nstd::u32string to_titlecase(const char32_t *s32, size_t l, const char *lang = nullptr);\nstd::u32string to_case_fold(const char32_t *s32, size_t l, bool special_case_for_uppercase_I_and_dotted_uppercase_I = false);\n\nbool is_uppercase(const char32_t *s32, size_t l);\nbool is_lowercase(const char32_t *s32, size_t l);\nbool is_titlecase(const char32_t *s32, size_t l);\nbool is_case_fold(const char32_t *s32, size_t l);\n\nbool caseless_match(const char32_t *s1, size_t l1, const char32_t *s2, size_t l2, bool special_case_for_uppercase_I_and_dotted_uppercase_I = false);\nbool canonical_caseless_match(const char32_t *s1, size_t l1, const char32_t *s2, size_t l2, bool special_case_for_uppercase_I_and_dotted_uppercase_I = false);\nbool compatibility_caseless_match(const char32_t *s1, size_t l1, const char32_t *s2, size_t l2, bool special_case_for_uppercase_I_and_dotted_uppercase_I = false);\n```\n\n### Code Block\n\n```cpp\nBlock block(char32_t cp)\n```\n\n### Script\n\n```cpp\nScript script(char32_t cp);\nbool is_script(Script sc, char32_t cp); // Script Extension support\n```\n\n### Normalization\n\n```cpp\nstd::u32string to_nfc(const char32_t *s32, size_t l);\nstd::u32string to_nfd(const char32_t *s32, size_t l);\nstd::u32string to_nfkc(const char32_t *s32, size_t l);\nstd::u32string to_nfkd(const char32_t *s32, size_t l);\n```\n\n### Combining Character Sequence\n\n```cpp\nbool is_graphic_character(char32_t cp);\nbool is_base_character(char32_t cp);\nbool is_combining_character(char32_t cp);\n\nsize_t combining_character_sequence_length(const char32_t* s32, size_t l);\nsize_t combining_character_sequence_count(const char32_t* s32, size_t l);\n\nsize_t extended_combining_character_sequence_length(const char32_t* s32, size_t l);\nsize_t extended_combining_character_sequence_count(const char32_t* s32, size_t l);\n```\n\n### Text Segmentation\n\n```cpp\nbool is_grapheme_boundary(const char32_t* s32, size_t l, size_t i);\nsize_t grapheme_length(const char32_t* s32, size_t l);\nsize_t grapheme_count(const char32_t* s32, size_t l);\n\nbool is_word_boundary(const char32_t *s32, size_t l, size_t i);\n\nbool is_sentence_boundary(const char32_t *s32, size_t l, size_t i);\n```\n\n### Encoding\n\n#### UTF8 Encoding\n\n```cpp\nnamespace utf8 {\n\nsize_t codepoint_length(char32_t uc);\nsize_t codepoint_length(const char* s8, size_t l);\nsize_t codepoint_count(const char* s8, size_t l);\n\nsize_t encode_codepoint(char32_t uc, std::string\u0026 out);\nvoid encode(const char32_t* s32, size_t l, std::string\u0026 out);\n\nsize_t decode_codepoint(const char* s8, size_t l, char32_t\u0026 out);\nvoid decode(const char* s8, size_t l, std::u32string\u0026 out);\n\n}\n```\n\n#### UTF16 Encoding\n\n```cpp\nnamespace utf16 {\n\nsize_t codepoint_length(char32_t uc);\nsize_t codepoint_length(const char16_t* s16, size_t l);\nsize_t codepoint_count(const char16_t* s16, size_t l);\n\nsize_t encode_codepoint(char32_t uc, std::u16string\u0026 out);\nvoid encode(const char32_t* s32, size_t l, std::u16string\u0026 out);\n\nsize_t decode_codepoint(const char16_t* s16, size_t l, char32_t\u0026 out);\nvoid decode(const char16_t* s16, size_t l, std::u32string\u0026 out);\n\n}\n```\n\n#### std::wstring Conversion\n\n```cpp\nstd::wstring to_wstring(const char *s8, size_t l);\nstd::wstring to_wstring(const char *s16, size_t l);\nstd::wstring to_wstring(const char32_t *s32, size_t l);\nstd::string to_utf8(const wchar_t *sw, size_t l);\nstd::u16string to_utf16(const wchar_t *sw, size_t l);\nstd::u32string to_utf32(const wchar_t *sw, size_t l);\n```\n\nLicense\n-------\n\nMIT license (© 2023 Yuji Hirose)\n","funding_links":[],"categories":["Unicode"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyhirose%2Fcpp-unicodelib","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyhirose%2Fcpp-unicodelib","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyhirose%2Fcpp-unicodelib/lists"}