{"id":18411274,"url":"https://github.com/ysenarath/sinling","last_synced_at":"2025-04-07T11:31:43.568Z","repository":{"id":54587311,"uuid":"178089155","full_name":"ysenarath/sinling","owner":"ysenarath","description":"A collection of NLP tools for Sinhalese (සිංහල).","archived":false,"fork":false,"pushed_at":"2021-06-28T20:34:18.000Z","size":46552,"stargazers_count":55,"open_issues_count":1,"forks_count":18,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-03-22T16:52:51.830Z","etag":null,"topics":["joiner","language-processing","morphological-analyser","natural-language-processing","nlp","part-of-speech","pos-tagging","sinhala","sinhala-nlp","sinhala-stemmer","sinhala-tokenizer","splitter","tokenizer","tool","toolkit"],"latest_commit_sha":null,"homepage":"https://sinling.ysenarath.com","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ysenarath.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-03-27T23:21:01.000Z","updated_at":"2025-03-03T03:53:19.000Z","dependencies_parsed_at":"2022-08-13T20:40:11.921Z","dependency_job_id":null,"html_url":"https://github.com/ysenarath/sinling","commit_stats":null,"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ysenarath%2Fsinling","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ysenarath%2Fsinling/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ysenarath%2Fsinling/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ysenarath%2Fsinling/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ysenarath","download_url":"https://codeload.github.com/ysenarath/sinling/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247399884,"owners_count":20932880,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["joiner","language-processing","morphological-analyser","natural-language-processing","nlp","part-of-speech","pos-tagging","sinhala","sinhala-nlp","sinhala-stemmer","sinhala-tokenizer","splitter","tokenizer","tool","toolkit"],"created_at":"2024-11-06T03:35:51.681Z","updated_at":"2025-04-07T11:31:38.550Z","avatar_url":"https://github.com/ysenarath.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# A language processing tool for Sinhalese (සිංහල). \n\n`Update 2020.11.01: Fixed pypi package. Use 'pip install sinling' to install sinling directly from repository.`\n\n`Update 2020.08.16: Add pypi package @ https://pypi.org/project/sinling/.`\n\n`Update 2020.08.16: Integrated Part of speech tagger and stemmer tool.`\n\n`Update 2019.07.21: This tool no longer requires java to run sinhala tokenizer. \nAll java code is ported to Python implementation for convenience.`\n\n[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ysenarath/sinling.git/master?filepath=notebooks%2Fexamples.ipynb)\n[![PyPI version](https://badge.fury.io/py/sinling.svg)](https://badge.fury.io/py/sinling)\n\n## Installation\n\nRun the following command in your virtualenv to install this package.\n\n`pip install sinling`\n\n## How to use\n### Sinhala Tokenizer\n```python\nfrom sinling import SinhalaTokenizer\n\ntokenizer = SinhalaTokenizer()\n\nsentence = '...'  # your sentence\n\ntokenizer.tokenize(sentence)\n```\n\n### Sinhala Stemmer (Experimental)\n```python\nfrom sinling import SinhalaStemmer\n\nstemmer = SinhalaStemmer()\n\nword = '...'  # your sentence\n\nstemmer.stem(word)\n```\n\nPlease cite [sinhala-stemmer](https://github.com/rksk/sinhala-news-analysis/tree/master/sinhala-stemmer) if you are using this implementation.\n\n### Part-of-Speech Tagger\n\n```python\nfrom sinling import SinhalaTokenizer, POSTagger\n\ntokenizer = SinhalaTokenizer()\n\ndocument = '...'  # may contain multiple sentences\n\ntokenized_sentences = [tokenizer.tokenize(f'{ss}.') for ss in tokenizer.split_sentences(document)]\n\ntagger = POSTagger()\n\npos_tags = tagger.predict(tokenized_sentences)\n```\n\n### Word Joiner (Morphological Joiner)\n```python\nfrom sinling import preprocess, word_joiner\n\nw1 = preprocess('මුනි')\nw2 = preprocess('උතුමා')\nresults = word_joiner.join(w1, w2)\n# Returns a list of possible results after applying join rules ['මුනිතුමා', ...]\n```\n\n### Word Splitter (Morphological Splitter) / corpus based - *experimental*\n```python\nfrom sinling import word_splitter\n\nword = '...'\nresults = word_splitter.split(word)\n# Returns a dict containing debug information, base word and affix\n```\n\nVisit [here](https://github.com/ysenarath/sinling/blob/master/notebooks/splitter.ipynb) to see some sample splits.\n\n## Contributions\n- Contact `wayasas.13@cse.mrt.ac.lk` if you would like to contribute to this project.\n\n## License\nApache License\nVersion 2.0, January 2004\nhttp://www.apache.org/licenses/\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fysenarath%2Fsinling","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fysenarath%2Fsinling","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fysenarath%2Fsinling/lists"}