{"id":22493345,"url":"https://github.com/ufcpp/graphemesplitter","last_synced_at":"2025-08-03T01:32:04.460Z","repository":{"id":77042997,"uuid":"108571829","full_name":"ufcpp/GraphemeSplitter","owner":"ufcpp","description":"A C# implementation of the Unicode grapheme cluster breaking algorithm","archived":false,"fork":false,"pushed_at":"2020-11-10T03:31:46.000Z","size":183,"stargazers_count":48,"open_issues_count":4,"forks_count":7,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-11-09T06:56:20.460Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"C#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ufcpp.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-10-27T16:55:22.000Z","updated_at":"2024-11-06T02:15:44.000Z","dependencies_parsed_at":null,"dependency_job_id":"59cc56f0-cdf5-440e-82e5-deae4609c5d1","html_url":"https://github.com/ufcpp/GraphemeSplitter","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ufcpp%2FGraphemeSplitter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ufcpp%2FGraphemeSplitter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ufcpp%2FGraphemeSplitter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ufcpp%2FGraphemeSplitter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ufcpp","download_url":"https://codeload.github.com/ufcpp/GraphemeSplitter/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":228514292,"owners_count":17932379,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-06T18:39:02.274Z","updated_at":"2024-12-06T18:39:02.725Z","avatar_url":"https://github.com/ufcpp.png","language":"C#","funding_links":[],"categories":[],"sub_categories":[],"readme":"# GraphemeSplitter\n\nA C# implementation of the Unicode grapheme cluster breaking algorithm.\n\n## **Notes**\n\n- This library uses Unicode 10.0 version of grepheme boundary algorithm.\n- In .NET 5.0, [`StringInfo.GetTextElementEnumerator `](https://docs.microsoft.com/en-us/dotnet/api/system.globalization.stringinfo.gettextelementenumerator) can enumerate graphemes correctly with Unicode 13.0 algorithm.\n\n## NuGet package\n\nhttps://www.nuget.org/packages/GraphemeSplitter/\n\n```powershell\nInstall-Package GraphemeSplitter\n```\n\n## Sample\n\n```cs\nusing GraphemeSplitter;\nusing static System.Console;\nusing static System.String;\n\npublic partial class Program\n{\n    static string Split(string s) =\u003e Join(\", \", s.GetGraphemes());\n\n    static void Main()\n    {\n        WriteLine(Split(\"👨‍👨‍👧‍👦👩‍👩‍👧‍👦👨‍👨‍👧‍👦\")); // 👨‍👨‍👧‍👦, 👩‍👩‍👧‍👦, 👨‍👨‍👧‍👦\n    }\n}\n```\n\n[Web Sample](tree/master/RazorPageSample):\n\n\n![Razor Page Sample](doc/RazorPageSample.png)\n\n## Implementation\n\nThis library basically implements http://unicode.org/reports/tr29/.\n\nExpample:\n\ntype | text | split result\n--- | --- | ---\ndiacritical marks | à̡̠́ḅ̢̂̃c̣̤̃̄d̥̦̅̆ | \"à̡̠́\", \"ḅ̢̂̃\", \"c̣̤̃̄\", \"d̥̦̅̆\"\nvariation selector | 葛葛󠄀葛󠄁 | \"葛\", \"葛󠄀\", \"葛󠄁\"\nasian syllable | 안녕하세요 | \"안\", \"녕\", \"하\", \"세\", \"요\"\nfamily emoji | 👨‍👨‍👧‍👦👩‍👩‍👧‍👦👨‍👨‍👧‍👦 | \"👨‍👨‍👧‍👦\", \"👩‍👩‍👧‍👦\", \"👨‍👨‍👧‍👦\"\nemoji skin tone | 👩🏻👱🏼👧🏽👦🏾 | \"👩🏻\", \"👱🏼\", \"👧🏽\", \"👦🏾\"\n\nbut slacks out the GB10, GB12, and GB13 rules for simplification.\n\noriginal:\n\n- GB10 … (E_Base | EBG) Extend* × E_Modifier\n- GB12 … sot (RI RI)* RI × RI\n- GB13 … [^RI] (RI RI)* RI × RI\n\nimplemented:\n\n- GB10 … (E_Base | EBG) × Extend\n- GB10 … (E_Base | EBG | Extend) × E_Modifier\n- GB12/GB13 … RI × RI\n\nDifference is:\n\nsequence       | original | implemented\n--- | --- | ---\nà🏻‍ (U+61, U+300, U+1F3FB)  | × ÷    | × ×\n🇯🇵🇺🇸 (U+1F1EF, U+1F1F5, U+1F1FA, U+1F1F8) | × ÷ × | × × ×\n\n(where ÷ and × means boundary and no bounadry respectively.)\n\n## Acknowledgements\n\nThis library is influenced by\n- https://github.com/devongovett/grapheme-breaker\n- https://github.com/orling/grapheme-splitter\n- https://github.com/unicode-rs/unicode-segmentation\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fufcpp%2Fgraphemesplitter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fufcpp%2Fgraphemesplitter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fufcpp%2Fgraphemesplitter/lists"}