{"id":13589099,"url":"https://github.com/seanghay/khmernormalizer","last_synced_at":"2025-07-13T04:32:14.787Z","repository":{"id":182244110,"uuid":"668116823","full_name":"seanghay/khmernormalizer","owner":"seanghay","description":"A missing toolkit for Khmer Natural Language Processing.","archived":false,"fork":false,"pushed_at":"2023-07-26T10:13:00.000Z","size":25,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-04-24T08:16:32.042Z","etag":null,"topics":["khmer","nlp","normalization","normalizer","verbalization"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/seanghay.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-07-19T04:20:45.000Z","updated_at":"2023-12-14T15:49:34.000Z","dependencies_parsed_at":"2024-01-14T04:07:42.546Z","dependency_job_id":null,"html_url":"https://github.com/seanghay/khmernormalizer","commit_stats":null,"previous_names":["seanghay/khmernormalizer","seanghay/khmer-normalizer"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seanghay%2Fkhmernormalizer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seanghay%2Fkhmernormalizer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seanghay%2Fkhmernormalizer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seanghay%2Fkhmernormalizer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/seanghay","download_url":"https://codeload.github.com/seanghay/khmernormalizer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225855955,"owners_count":17534967,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["khmer","nlp","normalization","normalizer","verbalization"],"created_at":"2024-08-01T16:00:22.945Z","updated_at":"2024-11-22T07:12:59.250Z","avatar_url":"https://github.com/seanghay.png","language":"Python","funding_links":[],"categories":["Awesome Khmer Language"],"sub_categories":["2. Toolkit"],"readme":"## Khmer Normalizer \n\nA missing toolkit for **Khmer Natural Language Processing**.\n\n- Character Reordering\n- Duplicate Whitespaces\n- Remove zero width space\n- Remove emojis\n- Fix Common misspellings\n- Fix Unicode issues\n- Fix Khmer trailing vowels\n- URL Replacements\n- Unicode Normalization (NFKC)\n- Quotes symbols normalization\n- Remove repeated punctuations\n\n### Installation\n\n```shell\npip install khmernormalizer\n```\n\n### Usage\n\n```python\nfrom khmernormalizer import normalize\n\ninput_str = \"\"\"\nតាម៖៖​សេចក្តី​រាយ​ការណ៍​​ឲ្យ​ដឹង​ថា!!!!!\nhttps://google.com/a?x=1\nកាល 😂 ពីវេលាម៉ោង    ៗ      ប្រមាណ១១យប់ថ្ងៃទី៤ 😂😂😂😂😂 ??\nកាាាាត់\nមិិិិិន \nមួយរយះះះះះះះ\nរយះពេល\n\"\"\".strip()\n\nnormalize(input_str, \n          emoji_replacement=\"\", \n          remove_zwsp=True, \n          url_replacement=\"\")\n```\n\nResult:\n```\nតាម៖សេចក្តីរាយការណ៍ឱ្យដឹងថា!\n\nកាល ពីវេលាម៉ោងៗ ប្រមាណ១១យប់ថ្ងៃទី៤?\nកាត់\nមិន \nមួយរយៈ\nរយៈពេល\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fseanghay%2Fkhmernormalizer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fseanghay%2Fkhmernormalizer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fseanghay%2Fkhmernormalizer/lists"}