{"id":23318644,"url":"https://github.com/peelonet/peelo-unicode","last_synced_at":"2025-04-07T05:19:07.147Z","repository":{"id":80332159,"uuid":"137586217","full_name":"peelonet/peelo-unicode","owner":"peelonet","description":"Simple Unicode utilities for C++","archived":false,"fork":false,"pushed_at":"2025-02-01T10:59:01.000Z","size":631,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-13T09:32:11.611Z","etag":null,"topics":["cpp-library","header-only","unicode","unicode-support","utf-16","utf-32","utf-8"],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/peelonet.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-06-16T14:24:38.000Z","updated_at":"2025-02-01T10:57:25.000Z","dependencies_parsed_at":"2025-02-13T09:42:36.397Z","dependency_job_id":null,"html_url":"https://github.com/peelonet/peelo-unicode","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/peelonet%2Fpeelo-unicode","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/peelonet%2Fpeelo-unicode/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/peelonet%2Fpeelo-unicode/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/peelonet%2Fpeelo-unicode/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/peelonet","download_url":"https://codeload.github.com/peelonet/peelo-unicode/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247595335,"owners_count":20963943,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cpp-library","header-only","unicode","unicode-support","utf-16","utf-32","utf-8"],"created_at":"2024-12-20T17:17:49.010Z","updated_at":"2025-04-07T05:19:07.128Z","avatar_url":"https://github.com/peelonet.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# peelo-unicode\n\n![Build](https://github.com/peelonet/peelo-unicode/workflows/Build/badge.svg)\n\nCollection of simple to use [Unicode] utilities for C++17. Supports Unicode\n15.1.\n\n[Doxygen generated API documentation.][API]\n\n[Unicode]: https://en.wikipedia.org/wiki/Unicode\n[API]: https://peelonet.github.io/peelo-unicode/index.html\n\n## Character testing functions\n\nThe library ships with Unicode version of [ctype.h] header, containing\nfollowing functions inside `peelo::unicode::ctype` namespace:\n\n- `isalnum()`\n- `isalpha()`\n- `isblank()`\n- `iscntrl()`\n- `isdigit()`\n- `isgraph()`\n- `islower()`\n- `isprint()`\n- `ispunct()`\n- `isspace()`\n- `isupper()`\n- `isxdigit()`\n- `tolower()`\n- `toupper()`\n\nAdditional functions not found in `ctype.h` are:\n\n- `isvalid()` - Tests whether given value is valid Unicode codepoint.\n- `isemoji()` - Tests whether given Unicode codepoint is an [emoji].\n\n[ctype.h]: https://en.cppreference.com/w/cpp/header/cctype\n[emoji]: https://en.wikipedia.org/wiki/Emoji\n\n### Example\n\n```cpp\n#include \u003ciostream\u003e\n#include \u003cpeelo/unicode/ctype.hpp\u003e\n\nint\nmain()\n{\n  using namespace peelo::unicode::ctype;\n\n  std::cout \u003c\u003c isalnum(U'Ä') \u003c\u003c std::endl;\n  std::cout \u003c\u003c isdigit(U'൧') \u003c\u003c std::endl;\n  std::cout \u003c\u003c isgraph(U'€') \u003c\u003c std::endl;\n  std::cout \u003c\u003c ispunct(U'\\u2001') \u003c\u003c std::endl;\n  std::cout \u003c\u003c std::hex;\n  std::cout \u003c\u003c tolower(U'Ä') \u003c\u003c std::endl;\n  std::cout \u003c\u003c toupper(U'ä') \u003c\u003c std::endl;\n}\n```\n\n## Character encodings\n\nThe library also provides functions for encoding and decoding Unicode character\nencodings. Both validating and non-validating (where all encoding/decoding\nerrors are ignored) functions are provided.\n\nSupported character encodings are:\n\n- [UTF-8]\n- [UTF-16BE][UTF-16]\n- [UTF-16LE][UTF-16]\n- [UTF-32BE][UTF-32]\n- [UTF-32LE][UTF-32]\n\n[UTF-8]: https://en.wikipedia.org/wiki/UTF-8\n[UTF-16]: https://en.wikipedia.org/wiki/UTF-16\n[UTF-32]: https://en.wikipedia.org/wiki/UTF-32\n\n### Example\n\n```cpp\n#include \u003cpeelo/unicode/encoding.hpp\u003e\n\nint\nmain()\n{\n  using namespace peelo::unicode::encoding;\n\n  // Decode UTF-8 input, ignoring any decoding errors.\n  std::u32string utf8_decoded = utf8::decode(\"\\xe2\\x82\\xac\");\n\n  // Encode it back to byte string, ignoring any encoding errors.\n  std::string utf8_encoded = utf8::encode(utf8_decoded);\n\n  // Decode UTF-32BE input with validation.\n  std::u32string utf32be_decoded;\n  if (utf32be::decode_validate(\"\\x00\\x00 \\xac\", utf32be_decoded))\n  {\n    // Given input is valid UTF-32BE.\n  } else {\n    // Given input is invalid UTF-32BE.\n  }\n\n  // Encode it back to byte string, with validation.\n  std::string utf32be_encoded;\n  if (utf32be::encode_validate(utf32be_decoded, utf32be_encoded))\n  {\n    // Given input contained only valid Unicode code points.\n  } else {\n    // Given input contained invalid Unicode code points.\n  }\n}\n```\n\n## BOM detection\n\nThe library provides function for detecting whether an byte string contains\n[byte order mark] or not, and which character encoding it is. Even though use\nof BOM is rare these days, it might sometimes be useful to able to detect it.\n\nList of detected character encodings are:\n\n- [UTF-8]\n- [UTF-16BE][UTF-16]\n- [UTF-16LE][UTF-16]\n- [UTF-32BE][UTF-32]\n- [UTF-32LE][UTF-32]\n- [UTF-7]\n- [UTF-1]\n- [UTF-EBCDIC]\n- [SCSU]\n- [BOCU-1]\n- [GB18030]\n\n[Byte order mark]: https://en.wikipedia.org/wiki/Byte_order_mark\n[UTF-7]: https://en.wikipedia.org/wiki/UTF-7\n[UTF-1]: https://en.wikipedia.org/wiki/UTF-1\n[UTF-EBCDIC]: https://en.wikipedia.org/wiki/UTF-EBCDIC\n[SCSU]: https://en.wikipedia.org/wiki/Standard_Compression_Scheme_for_Unicode\n[BOCU-1]: https://en.wikipedia.org/wiki/Binary_Ordered_Compression_for_Unicode\n[GB18030]: https://en.wikipedia.org/wiki/GB_18030\n\n### Example\n\n```cpp\n#include \u003cfstream\u003e\n#include \u003ciostream\u003e\n#include \u003cpeelo/unicode/bom.hpp\u003e\n\nint\nmain()\n{\n  char buffer[1024];\n  std::fstream f(\"file.txt\");\n  std::size_t length;\n\n  f.read(buffer, sizeof(buffer));\n  length = f.gcount();\n  f.close();\n\n  if (const auto bom = peelo::unicode::bom::detect(buffer, length))\n  {\n    if (*bom == peelo::unicode::bom::type::utf16_be)\n    {\n      std::cout \u003c\u003c \"File has UTF-16BE BOM.\" \u003c\u003c std::endl;\n    } else {\n      std::cout \u003c\u003c \"File has some other BOM.\" \u003c\u003c std::endl;\n    }\n  } else {\n    std::cout \u003c\u003c \"File does not contain BOM.\" \u003c\u003c std::endl;\n  }\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpeelonet%2Fpeelo-unicode","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpeelonet%2Fpeelo-unicode","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpeelonet%2Fpeelo-unicode/lists"}