{"id":24125644,"url":"https://github.com/railgunlabs/charisma","last_synced_at":"2025-02-28T23:15:34.056Z","repository":{"id":271322663,"uuid":"912206575","full_name":"railgunlabs/charisma","owner":"railgunlabs","description":"Secure Unicode® character decoders and encoders.","archived":false,"fork":false,"pushed_at":"2025-02-12T17:30:21.000Z","size":57,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-12T18:36:05.963Z","etag":null,"topics":["c99","misra-c","unicode","utf-16","utf-32","utf-8"],"latest_commit_sha":null,"homepage":"https://RailgunLabs.com/charisma","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"lgpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/railgunlabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-04T22:49:45.000Z","updated_at":"2025-02-12T17:30:24.000Z","dependencies_parsed_at":"2025-02-02T17:26:16.488Z","dependency_job_id":"bccafe9d-03e6-4cfb-bca1-6f3e524de373","html_url":"https://github.com/railgunlabs/charisma","commit_stats":null,"previous_names":["railgunlabs/charisma"],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/railgunlabs%2Fcharisma","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/railgunlabs%2Fcharisma/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/railgunlabs%2Fcharisma/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/railgunlabs%2Fcharisma/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/railgunlabs","download_url":"https://codeload.github.com/railgunlabs/charisma/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241272618,"owners_count":19937091,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c99","misra-c","unicode","utf-16","utf-32","utf-8"],"created_at":"2025-01-11T15:25:52.030Z","updated_at":"2025-02-28T23:15:34.050Z","avatar_url":"https://github.com/railgunlabs.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"﻿\u003cpicture\u003e\n  \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\".github/charisma-dark.svg\"\u003e\n  \u003csource media=\"(prefers-color-scheme: light)\" srcset=\".github/charisma.svg\"\u003e\n  \u003cimg alt=\"Charisma\" src=\".github/charisma.svg\" width=\"408px\"\u003e\n\u003c/picture\u003e\n\n**Char**isma is a Unicode® character decoder and encoder library that conforms to the MISRA C:2012 coding standard.\nIt provides functions for decoding and encoding characters _safely_ in UTF-8, UTF-16, and UTF-32 (big or little endian).\nIt can _recover_ from malformed characters, allowing decoding to continue.\n\n[![Build Status](https://github.com/railgunlabs/charisma/actions/workflows/build.yml/badge.svg)](https://github.com/railgunlabs/charisma/actions/workflows/build.yml)\n\n## Why?\n\nThere are many Unicode character decoders floating about, but most are **unsafe** and do not support recovering from malformed character sequences.\nAttempting to decode or incorrectly recover from malformed text with these decoders can lead to security vulnerabilities.\nIt's critical for software that processes external text to use a _robust_ character decoder that can detect malformed character sequences.\n\n## Features\n\n* Safely decode and encode Unicode characters\n* Safely recover from malformed character sequences\n* Supports UTF-8, UTF-16-BE, UTF-16-LE, UTF-32-BE, and UTF-32-LE\n* Supports both null terminated and non-null terminated strings\n* Reentrant implementation\n* Lightweight (\u003c 200 semicolons)\n* Extensively tested (see below)\n* No dependencies\n\n## MISRA C:2012 Compliance\n\nCharisma honors all Required, Mandatory, and Advisory rules defined by MIRSA C:2012 and its four amendments.\nThe complete compliance table is [documented here](https://railgunlabs.com/charisma/manual/misra-compliance/).\n\n## Ultra Portable\n\nCharisma is _ultra portable_.\nIt's written in C99 and only requires a few features from libc which are listed in the following table.\n\n| Header | Types | Macros |\n| --- | --- | --- |\n| **stdint.h** | `uint8_t`, `uint16_t`, \u003cbr/\u003e `int32_t`, `uint32_t` | |\n| **stdbool.h** | |  `bool`, `true`, `false` |\n| **assert.h** | |  `assert` |\n\n## How Charisma is Tested\n\n* 100% branch coverage\n* Unit tests\n* Fuzz tests\n* Static analysis\n* Valgrind analysis\n* Code sanitizers (UBSAN, ASAN, and MSAN)\n* Extensive use of assert() and run-time checks\n\n## Example\n\nThis code snippet demonstrates how to decode UTF-8 text.\n\n```c\nconst char8_t *string = \"The quick 갈색 🦊 กระโดด över the 怠け者 🐶.\";\nint32_t index = 0;\nfor (;;)\n{\n    uchar cp = 0x0;\n    int32_t r = utf8_decode(string, -1, \u0026index, \u0026cp);\n    if (r == 0)\n    {\n        break; // end of string\n    }\n    else if (r \u003c 0)\n    {\n        // malformed character sequence\n    }\n\n    // Malformed character sequences will be\n    // recovered from and returned as U+FFFD.\n    printf(\"U+%04X\\n\", cp);\n}\n```\n\n## Building\n\nDownload the [latest release](https://github.com/railgunlabs/charisma/releases/) and build with\n\n```\n$ ./configure\n$ make\n$ make install\n```\n\nor build with [CMake](https://cmake.org/).\n\n## Related Work\n\nCharisma is focused on decoding and encoding Unicode characters.\nIf you need Unicode algorithms, like normalization or collation, then use [Unicorn](https://github.com/railgunlabs/unicorn).\n\n## License\n\nCharisma is dual-licensed under the GNU Lesser General Public License version 3 (LGPL v3) and a proprietary license, which can be purchased from [Railgun Labs](https://railgunlabs.com/charisma/license/).\n\nThe unit tests are **not** open source.\nAccess to them is granted exclusively to commercial licensees.\n\n_Unicode® is a registered trademark of Unicode, Inc. in the United States and other countries. This project is not in any way associated with or endorsed or sponsored by Unicode, Inc. (aka The Unicode Consortium)._\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frailgunlabs%2Fcharisma","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frailgunlabs%2Fcharisma","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frailgunlabs%2Fcharisma/lists"}