{"id":16832334,"url":"https://github.com/npillmayer/uax","last_synced_at":"2025-09-25T11:41:56.952Z","repository":{"id":43010556,"uuid":"318649208","full_name":"npillmayer/uax","owner":"npillmayer","description":"Unicode Text Segmentation Algorithms","archived":false,"fork":false,"pushed_at":"2022-07-12T12:56:26.000Z","size":1902,"stargazers_count":9,"open_issues_count":4,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-11T04:14:20.101Z","etag":null,"topics":["text-processing","text-segmentation","unicode"],"latest_commit_sha":null,"homepage":"http://npillmayer.github.io/UAX/","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/npillmayer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-12-04T22:28:07.000Z","updated_at":"2024-08-25T08:17:28.000Z","dependencies_parsed_at":"2022-09-09T13:40:26.140Z","dependency_job_id":null,"html_url":"https://github.com/npillmayer/uax","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/npillmayer/uax","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/npillmayer%2Fuax","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/npillmayer%2Fuax/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/npillmayer%2Fuax/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/npillmayer%2Fuax/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/npillmayer","download_url":"https://codeload.github.com/npillmayer/uax/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/npillmayer%2Fuax/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261504993,"owners_count":23168941,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["text-processing","text-segmentation","unicode"],"created_at":"2024-10-13T11:48:33.613Z","updated_at":"2025-09-25T11:41:51.898Z","avatar_url":"https://github.com/npillmayer.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cimg alt=\"UAX Logo\" src=\"http://npillmayer.github.io/UAX/img/UAX-Logo.svg\" width=\"110\" style=\"max-width:110\"\u003e\n\n### Unicode Text Segmentation Algorithms\n\nText processing applications need to segment text into pieces. Segments may be\n\n* words\n* sentences\n* paragraphs\n\nand so on. For western languages this is not too hard of a problem, but it may become an involved endeavor if you consider Arabic or Asian languages. From a typographic viewpoint some of these languages present serious challenges for correct segmenting. The Unicode consortium publishes recommendations and algorithms for various aspects of text segmentation in their Unicode Annexes (**UAX**).\n\n## Text Segmentation in Go\n\nThere exist a number of Unicode standards describing best practices for text segmentation. Unfortunately, implementations in Go are sparse. Marcel van Lohuizen from the Go Core Team seems to be working on text segmenting, but with low priority. In the long run, it will be best to wait for the standard library to include functions for text segmentation. However, for now I will implement my own.\n\n## Status\n\nThis is very much work in progress, not intended for production use. Please be patient.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnpillmayer%2Fuax","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnpillmayer%2Fuax","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnpillmayer%2Fuax/lists"}