{"id":28987433,"url":"https://github.com/codebasic/pyko","last_synced_at":"2025-07-08T17:35:37.535Z","repository":{"id":62581237,"uuid":"92717305","full_name":"codebasic/pyko","owner":"codebasic","description":"Korean Text Processing using Python","archived":false,"fork":false,"pushed_at":"2020-09-25T06:28:46.000Z","size":6808,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-06-24T21:04:41.245Z","etag":null,"topics":["korean-nlp","korean-text-processing","korean-tokenizer","machine-learning","natural-language-processing","nlp","python","python3"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/codebasic.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-05-29T07:37:55.000Z","updated_at":"2020-09-25T06:05:53.000Z","dependencies_parsed_at":"2022-11-03T21:20:01.131Z","dependency_job_id":null,"html_url":"https://github.com/codebasic/pyko","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/codebasic/pyko","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codebasic%2Fpyko","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codebasic%2Fpyko/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codebasic%2Fpyko/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codebasic%2Fpyko/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/codebasic","download_url":"https://codeload.github.com/codebasic/pyko/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codebasic%2Fpyko/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264315027,"owners_count":23589699,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["korean-nlp","korean-text-processing","korean-tokenizer","machine-learning","natural-language-processing","nlp","python","python3"],"created_at":"2025-06-24T21:01:42.786Z","updated_at":"2025-07-08T17:35:37.519Z","avatar_url":"https://github.com/codebasic.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# pyko\n\npyko[파이코]는 한국어 텍스트 처리를 위한 파이썬 라이브러리입니다. 자연어 처리에서 한국어가 갖는 독자적인 특징을\n반영해 처리합니다.\n\n## 설치\n\nPyPI에 등록된 패키지를 다음과 같이 설치할 수 있습니다.\n\n    pip install pyko\n\n## 세종말뭉치\n\n[세종말뭉치](https://www.korean.go.kr/nkview/nklife/2016_2/26_0204.pdf)를 NLTK CorpusReader를 활용하는 것과 유사하게 활용할 수 있습니다. 세종말뭉치는 [국립국어원 언어정보나눔터](https://ithub.korean.go.kr/)에서 획득할 수 있습니다.\n\n사용예시:\n\n```python\nfrom pyko.reader import SejongCorpusReader\n\n세종말뭉치 = SejongCorpusReader(root, fileids)\n파일목록 = 세종말뭉치.fileids()\n\n형태분석목록 = 세종말뭉치.words(tagged=True)\nprint(형태분석목록)\n\"\"\"\n[('뭐', (('뭐', 'NP'),)), ('타고', (('타', 'VV'), ('고', 'EC'))), ('가?', (('가', 'VV'), ('ㅏ', 'EF'), ('?', 'SF'))), ('지하철.', (('지하철', 'NNG'), ('.', 'SF'))), ('기차?', (('기차', 'NNG'), ('?', 'SF'))), ('아침에', (('아침', 'NNG'), ('에', 'JKB'))), ...]\n\"\"\"\n\n형태분석문장목록 = 세종말뭉치.sents(tagged=True)\nprint(형태분석문장목록[0])\n\"\"\"\n[('뭐', (('뭐', 'NP'),)),\n ('타고', (('타', 'VV'), ('고', 'EC'))),\n ('가?', (('가', 'VV'), ('ㅏ', 'EF'), ('?', 'SF')))]\n\"\"\"\n```\n\n## 형태소 분리 및 품사 예측\n\n### v0.4.0+\n형태소 분석기는 딥러닝 기반의 카카오 형태소 분석기, [kakao/khaiii](https://github.com/kakao/khaiii)를 내부적으로 활용합니다. 해당 패키지가 시스템에 설치된 것을 가정합니다.\n\n모든 환경이 미리 설정된 도커(docker) 이미지를 활용하면 편리합니다.\n\npyko 도커 이미지: [codebasic/pyko](https://hub.docker.com/repository/docker/codebasic/pyko)\n\n도커 이미지 사용 예시\n\n```\n$ docker run -it codebasic/pyko\n```\n\n사용예시:\n\n```python\nfrom pyko import tokenizer as 형태소_분석기\n\n예문 = '한국어를 잘 처리하는지 궁금합니다.'\n\n형태소목록 = 형태소_분석기.tokenize(예문)\nprint(형태소목록)\n\"\"\"\n['한국어', '를', '잘', '처리', '하', '는지', '궁금', '하', 'ㅂ니다', '.']\n\"\"\"\n\n형태분석결과 = 형태소_분석기.tokenize(예문, tagged=True)\nprint(형태분석결과)\n\"\"\"\n[('한국어', 'NNP'),\n ('를', 'JKO'),\n ('잘', 'MAG'),\n ('처리', 'NNG'),\n ('하', 'XSV'),\n ('는지', 'EC'),\n ('궁금', 'XR'),\n ('하', 'XSA'),\n ('ㅂ니다', 'EF'),\n ('.', 'SF')]\n\"\"\"\n```\n\n## NLTK 연동\n\n말뭉치 관리를 위해 NLTK CourpusReader와 연동해서 사용할 수 있습니다.\n\n사용예시:\n\n```python\nfrom pyko import tokenizer as 형태소_분석기\nfrom nltk.corpus import PlaintextCorpusReader\n\nreader = PlaintextCorpusReader(root, fileids, word_tokenizer=형태소_분석기)\n형태분석결과 = reader.words()\nprint(형태분석결과)\n\"\"\"\n['세종', '(', '世宗', ',', '1397', '년', '5', '월', '7', '일', '(', '음력', '4', '월', ...]\n\"\"\"\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodebasic%2Fpyko","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcodebasic%2Fpyko","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodebasic%2Fpyko/lists"}