{"id":39081674,"url":"https://github.com/lovit/korean_lemmatizer","last_synced_at":"2026-01-17T18:30:47.517Z","repository":{"id":57469475,"uuid":"166682414","full_name":"lovit/korean_lemmatizer","owner":"lovit","description":"한국어 용언 분석기 (원형 복원, 용언 형태소 분석)","archived":false,"fork":false,"pushed_at":"2019-09-30T19:06:55.000Z","size":21134,"stargazers_count":42,"open_issues_count":6,"forks_count":11,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-09-29T16:04:26.106Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lovit.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-01-20T16:21:55.000Z","updated_at":"2025-06-18T15:45:53.000Z","dependencies_parsed_at":"2022-09-19T10:11:32.240Z","dependency_job_id":null,"html_url":"https://github.com/lovit/korean_lemmatizer","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/lovit/korean_lemmatizer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lovit%2Fkorean_lemmatizer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lovit%2Fkorean_lemmatizer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lovit%2Fkorean_lemmatizer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lovit%2Fkorean_lemmatizer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lovit","download_url":"https://codeload.github.com/lovit/korean_lemmatizer/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lovit%2Fkorean_lemmatizer/sbom","scorecard":{"id":600229,"data":{"date":"2025-08-11","repo":{"name":"github.com/lovit/korean_lemmatizer","commit":"160a7dfa928d4bb6863944ee69f4ddd6919dbd67"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":2.6,"checks":[{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Code-Review","score":0,"reason":"Found 0/30 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"SAST","score":0,"reason":"no SAST tool detected","details":["Warn: no pull requests merged into dev branch"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"License","score":0,"reason":"license file not detected","details":["Warn: project does not have a license file"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}}]},"last_synced_at":"2025-08-21T00:13:27.394Z","repository_id":57469475,"created_at":"2025-08-21T00:13:27.394Z","updated_at":"2025-08-21T00:13:27.394Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28515728,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-17T18:28:00.501Z","status":"ssl_error","status_checked_at":"2026-01-17T18:28:00.150Z","response_time":85,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-01-17T18:30:47.343Z","updated_at":"2026-01-17T18:30:47.485Z","avatar_url":"https://github.com/lovit.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 한국어 용언 분석기 (Korean Lemmatizer)\n\n한국어의 동사와 형용사의 활용형 (surfacial form) 을 분석합니다. 한국어 용언 분석기는 다음의 기능을 제공합니다.\n\n1. 입력된 단어를 어간 (stem) 과 어미 (eomi) 으로 분리\n1. 입력된 단어를 원형으로 복원\n\n이 패키지의 구현 원리는 [github.io 블로그][io]에 정리하였습니다.\n\n[io]: https://lovit.github.io/nlp/2019/01/22/trained_kor_lemmatizer/\n\n## Usage\n\n### analyze, lemmatize, conjugate\n\n`analyze` function returns morphemes of the given predicator word\n\n```python\nfrom soylemma import Lemmatizer\n\nlemmatizer = Lemmatizer()\nlemmatizer.analyze('차가우니까')\n```\n\nThe return value forms list of tuples because there can be more than one morpheme combination.\n\n```\n[(('차갑', 'Adjective'), ('우니까', 'Eomi'))]\n```\n\n`lemmatize` function returns lemma of the given predicator word.\n\n```python\nlemmatizer.lemmatize('차가우니까')\n```\n\n```\n[('차갑다', 'Adjective')]\n```\n\nIf the input word is not predicator such as Noun, it return empty list.\n\n```python\nlemmatizer.lemmatize('한국어') # []\n```\n\n`conjugate` function returns surfacial form. You should put stem and eomi as arguments. It returns all possible surfacial forms for the given stem and eomi.\n\n```python\nlemmatizer.conjugate(stem='차갑', eomi='우니까')\nlemmatizer.conjugate('예쁘', '었던')\n```\n\n```\n['차가우니까', '차갑우니까']\n['예뻤던', '예쁘었던']\n```\n\n### update dictionaries and rules\n\nFor demonstration, we use dictioanry `demo`.\n\n`어여뻤어` cannot be analyzed because the adjective `어여쁘` does not enrolled in dictionary.\n\n```python\nfrom soylemma import Lemmatizer\n\nlemmatizer = Lemmatizer(dictionary_name='demo')\nprint(lemmatizer.analyze('어여뻤어')) # []\n```\n\nSo, we add the word with tag using `add_words` function. Do it again. Then you can see the word `어여뻤어` is analyzed.\n\n```python\nlemmatizer.add_words('어여쁘', 'Adjective')\nlemmatizer.analyze('어여뻤어')\n```\n\n```\n[(('어여쁘', 'Adjective'), ('었어', 'Eomi'))]\n```\n\nHowever, the word `파랬다` is still not able to be analyzed because the lemmatization rule for surfacial form `랬` does not exist.\n\n```python\nlemmatizer.analyze('파랬다') # []\n```\n\nSo, in this time, we update additional lemmatization rules using `add_lemma_rules` function.\n\n```python\nsupplements = {\n    '랬': {('랗', '았')}\n}\n\nlemmatizer.add_lemma_rules(supplements)\n```\n\nAfter that, we can see the word `파랬다` is analyzed, and also conjugation of `파랗 + 았다` is available.\n\n```python\nlemmatizer.analyze('파랬다')\nlemmatizer.conjugate('파랗', '았다')\n```\n\n```\n[(('파랗', 'Adjective'), ('았다', 'Eomi'))]\n['파랬다', '파랗았다']\n```\n\n### debug on\n\nIf you wonder which subwords came up as candidates of (stem, eomi), use `debug`.\n\n```python\nlemmatizer.analyze('파랬다', debug=True)\n```\n\n```\n[DEBUG] word: 파랬다 = 파랗 + 았다, conjugation: 랬 = 랗 + 았\n[(('파랗', 'Adjective'), ('았다', 'Eomi'))]\n```\n\n### lemmatization rule extractor\n\nYou can extract lemmatization rule using `extract_rule` function.\n\n```python\nfrom soylemma import extract_rule\n\neojeol = '로드무비였다'\nlw = '로드무비이'\nlt = 'Adjective'\nrw = '었다'\nrt = 'Eomi'\n\nextract_rule(eojeol, lw, lt, rw, rt)\n```\n\n```\n('였다', ('이', '었다'))\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flovit%2Fkorean_lemmatizer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flovit%2Fkorean_lemmatizer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flovit%2Fkorean_lemmatizer/lists"}