{"id":13528216,"url":"https://github.com/open-korean-text/open-korean-text","last_synced_at":"2026-01-11T18:24:16.369Z","repository":{"id":40591425,"uuid":"79959741","full_name":"open-korean-text/open-korean-text","owner":"open-korean-text","description":"Open Korean Text Processor - An Open-source Korean Text Processor","archived":false,"fork":false,"pushed_at":"2024-03-12T17:28:46.000Z","size":34294,"stargazers_count":646,"open_issues_count":11,"forks_count":97,"subscribers_count":49,"default_branch":"master","last_synced_at":"2025-10-26T21:57:34.829Z","etag":null,"topics":["korean","korean-text-processing","korean-tokenizer","natural-language-processing","text-processing","tokenizer"],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/open-korean-text.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-01-24T22:11:41.000Z","updated_at":"2025-10-23T05:42:25.000Z","dependencies_parsed_at":"2024-01-13T11:57:53.844Z","dependency_job_id":"819c0c00-a73b-4922-a214-a3fbada75f7b","html_url":"https://github.com/open-korean-text/open-korean-text","commit_stats":null,"previous_names":["openkoreantext/open-korean-text"],"tags_count":58,"template":false,"template_full_name":null,"purl":"pkg:github/open-korean-text/open-korean-text","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/open-korean-text%2Fopen-korean-text","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/open-korean-text%2Fopen-korean-text/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/open-korean-text%2Fopen-korean-text/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/open-korean-text%2Fopen-korean-text/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/open-korean-text","download_url":"https://codeload.github.com/open-korean-text/open-korean-text/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/open-korean-text%2Fopen-korean-text/sbom","scorecard":{"id":708374,"data":{"date":"2025-08-11","repo":{"name":"github.com/open-korean-text/open-korean-text","commit":"74cc4ae7d3dab232747cd5ddb723e4b73c476e4f"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3.6,"checks":[{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Code-Review","score":1,"reason":"Found 5/28 approved changesets -- score normalized to 1","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: Apache License 2.0: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":-1,"reason":"internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration","details":null,"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 9 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-22T07:25:19.507Z","repository_id":40591425,"created_at":"2025-08-22T07:25:19.507Z","updated_at":"2025-08-22T07:25:19.507Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28317702,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-11T14:58:17.114Z","status":"ssl_error","status_checked_at":"2026-01-11T14:55:53.580Z","response_time":60,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["korean","korean-text-processing","korean-tokenizer","natural-language-processing","text-processing","tokenizer"],"created_at":"2024-08-01T06:02:19.690Z","updated_at":"2026-01-11T18:24:16.353Z","avatar_url":"https://github.com/open-korean-text.png","language":"Scala","funding_links":[],"categories":["Scala","Open Korean Text Processor","Programming Languages"],"sub_categories":["Scala"],"readme":"## open-korean-text [![Coverage Status](https://coveralls.io/repos/github/open-korean-text/open-korean-text/badge.svg?branch=master)](https://coveralls.io/github/open-korean-text/open-korean-text?branch=master) [![Build Status](https://travis-ci.org/open-korean-text/open-korean-text.svg?branch=master)](https://travis-ci.org/open-korean-text/open-korean-text) [![License](http://img.shields.io/:license-Apache%202-red.svg)](http://www.apache.org/licenses/LICENSE-2.0.txt)\n\n\nOpen-source Korean Text Processor / 오픈소스 한국어 처리기 (Official Fork of twitter-korean-text)\n\nScala/Java library to process Korean text with a Java wrapper. open-korean-text currently provides Korean normalization and tokenization. Please join our community at [Google Forum](https://groups.google.com/forum/#!forum/open-korean-text). The intent of this text processor is not limited to short tweet texts.\n\n\n스칼라로 쓰여진 한국어 처리기입니다. 현재 텍스트 정규화와 형태소 분석, 스테밍을 지원하고 있습니다. 짧은 트윗은 물론이고 긴 글도 처리할 수 있습니다. 개발에 참여하시고 싶은 분은 [Google Forum](https://groups.google.com/forum/#!forum/open-korean-text)에 가입해 주세요. 사용법을 알고자 하시는 초보부터 코드에 참여하고 싶으신 분들까지 모두 환영합니다.\n\n[설치 및 수정하는 방법 상세 안내](docs/contribution-guide.md)\n\nopen-korean-text의 목표는 빅데이터 등에서 간단한 한국어 처리를 통해 색인어를 추출하는 데에 있습니다. 완전한 수준의 형태소 분석을 지향하지는 않습니다.\n\nopen-korean-text는 normalization, tokenization, stemming, phrase extraction 이렇게 네가지 기능을 지원합니다.\n\n\n**정규화 normalization (입니닼ㅋㅋ -\u003e 입니다 ㅋㅋ, 샤릉해 -\u003e 사랑해)**\n\n* 한국어를 처리하는 예시입니닼ㅋㅋㅋㅋㅋ -\u003e 한국어를 처리하는 예시입니다 ㅋㅋ\n\n**토큰화 tokenization**\n\n* 한국어를 처리하는 예시입니다 ㅋㅋ -\u003e 한국어Noun, 를Josa, 처리Noun, 하는Verb, 예시Noun, 입니다Adjective(이다), ㅋㅋKoreanParticle\n\n**어근화 stemming (입니다 -\u003e 이다)**\n\n* 한국어를 처리하는 예시입니다 ㅋㅋ -\u003e 한국어Noun, 를Josa, 처리Noun, 하다Verb, 예시Noun, 이다Adjective, ㅋㅋKoreanParticle\n\n\n**어구 추출 phrase extraction**\n\n* 한국어를 처리하는 예시입니다 ㅋㅋ -\u003e 한국어, 처리, 예시, 처리하는 예시\n\nIntroductory Presentation: [Google Slides](https://docs.google.com/presentation/d/10CZj8ry03oCk_Jqw879HFELzOLjJZ0EOi4KJbtRSIeU/)\n\n## Web API Service\n\n[open-korean-text-api](https://github.com/open-korean-text/open-korean-text-api)  \n이 API 서비스는 Heroku 서버에서 제공되며(Domain: https://open-korean-text.herokuapp.com/)\n현재 정규화(normalization), 토큰화(tokenization), 어근화(stemmin), 어구 추출(phrase extract)\n서비스를 제공합니다.\n\n각 서비스와 사용법은 다음과 같습니다.  \n`normalize`, `tokenize`, `stem`, `extractPhrases` 가 각 서비스의 **Action** 이 되며 **Query parameter** 는 `text` 입니다.\n\n서비스 | 사용법\n---- | ----\n정규화 | https://open-korean-text-api.herokuapp.com/normalize?text=오픈코리안텍스트\n토큰화 | https://open-korean-text-api.herokuapp.com/tokenize?text=오픈코리안텍스트\n어근화 | https://open-korean-text-api.herokuapp.com/stem?text=오픈코리안텍스트\n어구 추출 | https://open-korean-text-api.herokuapp.com/extractPhrases?text=오픈코리안텍스트\n\n## Semantic Versioning\n\n1.0.2 (Major.Minor.Patch)\n\nMajor: API change\nMinor: Processor behavior change\nPatch: Bug fixes without a behavior change\n\n## API\n* [Scala Doc](https://open-korean-text.github.io/open-korean-text/scaladocs/org/openkoreantext/processor/index.html)\n\n* [Maven Doc](https://open-korean-text.github.io/open-korean-text/index.html)\n\n\u003c!-- ## Try it here --\u003e\n\n\u003c!-- Gunja Agrawal kindly created a test API webpage for this project: [http://gunjaagrawal.com/langhack/](http://gunjaagrawal.com/langhack/) --\u003e\n\n\u003c!-- Gunja Agrawal님이 만들어주신 테스트 웹 페이지 입니다. --\u003e\n\u003c!-- [http://gunjaagrawal.com/langhack/](http://gunjaagrawal.com/langhack/) --\u003e\n\n\u003c!-- Opensourced here: [twitter-korean-tokenizer-api](https://github.com/gunjaag/twitter-korean-tokenizer-api) --\u003e\n\n## Maven\nTo include this in your Maven-based JVM project, add the following lines to your pom.xml:\n/ Maven을 이용할 경우 pom.xml에 다음의 내용을 추가하시면 됩니다:\n\n```xml\n  \u003cdependency\u003e\n    \u003cgroupId\u003eorg.openkoreantext\u003c/groupId\u003e\n    \u003cartifactId\u003eopen-korean-text\u003c/artifactId\u003e\n    \u003cversion\u003e2.1.0\u003c/version\u003e\n  \u003c/dependency\u003e\n```\n\nMaven Repository: http://mvnrepository.com/artifact/org.openkoreantext/open-korean-text\n\n\u003c!-- The maven site is available here http://twitter.github.io/open-korean-text/ and scaladocs are here http://twitter.github.io/open-korean-text/scaladocs/ --\u003e\n\n## Support for other languages.\n\n| Type | Language | Contributor |\n| --- | --- | --- |\n| Wrapper | [.net/C#](https://github.com/open-korean-text/open-korean-text-wrapper-csharp) | [modamoda](https://github.com/modamoda) |\n| Wrapper | [Node JS](https://github.com/open-korean-text/open-korean-text-wrapper-node-1) | [Ch0p](https://github.com/Ch0p) |\n| Wrapper | [Node JS](https://github.com/open-korean-text/open-korean-text-wrapper-node-2) | [Youngrok Kim](https://github.com/rokoroku) |\n| Wrapper | [Python](https://github.com/open-korean-text/open-korean-text-wrapper-python) | [Jaepil Jeong](https://github.com/jaepil) |\n| Wrapper | [Clojure](https://github.com/open-korean-text/open-korean-text-4clj) | [Seonho Kim](https://github.com/ksseono) |\n| Wrapper | [Ruby for Java Version](https://github.com/open-korean-text/open-korean-text-wrapper-ruby-1) | [jun85664396](https://github.com/jun85664396) |\n| Wrapper | [Ruby for Scala Version](https://github.com/open-korean-text/open-korean-text-wrapper-ruby-2) | [Jaehyun Shin](https://github.com/keepcosmos) |\n| Porting | [Python](https://github.com/open-korean-text/open-korean-text-python) | [Baeg-il Kim](https://github.com/cedar101) |\n| Package | [Python Korean NLP](https://github.com/konlpy/konlpy) | [KoNLPy](https://github.com/konlpy/konlpy) |\n| Package | [Elastic Search](https://github.com/open-korean-text/open-korean-text-elastic-search) | [socurites](https://github.com/socurites) |\n| Package | [Elastic Search](https://github.com/open-korean-text/elasticsearch-analysis-openkoreantext) | [Jaehyun Shin](https://github.com/keepcosmos) |\n| Package | [JavaScript](https://github.com/71/oktjs) (browser-compatible) | [Grégoire Geis](https://github.com/71) |\n\n\n## Get the source / 소스를 원하시는 경우\n\nClone the git repo and build using maven.\n/ Git 전체를 클론하고 Maven을 이용하여 빌드합니다.\n\n```bash\ngit clone https://github.com/open-korean-text/open-korean-text.git\ncd open-korean-text\nmvn compile\n```\n\nOpen 'pom.xml' from your favorite IDE.\n\n## Basic Usage / 사용 방법\n\nYou can find these [examples](examples) in examples folder.\n/ [examples](examples) 폴더에 사용 방법 예제 파일이 있습니다.\n\n* [Scala Example](examples/src/main/scala/ScalaOpenKoreanTextExample.scala)\n\n* [Java Example](examples/src/main/java/JavaOpenKoreanTextProcessorExample.java)\n\n\n## Running Tests\n\n`mvn test` will run our unit tests\n/ 모든 유닛 테스트를 실행하려면 `mvn test`를 이용해 주세요.\n\n\n\u003c!-- ## Tools --\u003e\n\n\u003c!-- We provide tools for quality assurance and test resources. They can be found under [src/main/scala/org/openkoreantext/processor/qa](src/main/scala/org/openkoreantext/processor/qa) and [src/main/scala/org/openkoreantext/processor/tools](src/main/scala/org/openkoreantext/processor/tools). --\u003e\n\n\n## Contribution\n\nRefer to the [general contribution guide](CONTRIBUTING.md). We will add this project-specific contribution guide later.\n\n[설치 및 수정하는 방법 상세 안내](docs/contribution-guide.md)\n\n\n## Performance / 처리 속도\n\nTested on Intel i7 2.3 Ghz\n\nInitial loading time (초기 로딩 시간): 2~4 sec\n\nAverage time per parsing a chunk (평균 어절 처리 시간): 0.12 ms\n\n\n**Tweets (Avg length ~50 chars)**\n\nTweets|100K|200K|300K|400K|500K|600K|700K|800K|900K|1M\n---|---|---|---|---|---|---|---|---|---|---\nTime in Seconds|57.59|112.09|165.05|218.11|270.54|328.52|381.09|439.71|492.94|542.12\n\nAverage per tweet: 0.54212 ms\n\n**Benchmark test by [KoNLPy](http://konlpy.org/)**\n\n![Benchmark test](http://konlpy.org/ko/v0.4.2/_images/time.png)\n\nFrom [http://konlpy.org/ko/v0.4.3/morph/#pos-tagging-with-konlpy](http://konlpy.org/ko/v0.4.3/morph/#pos-tagging-with-konlpy)\n\n\n## Author\n\n* Will Hohyon Ryu (유호현): https://github.com/nlpenguin | https://twitter.com/NLPenguin\n\n## Admin Staff\n\n* Mingyu Kim (김민규): https://github.com/MechanicKim\n\n## License\n\nCopyright 2014 Twitter, Inc.\n\nLicensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopen-korean-text%2Fopen-korean-text","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopen-korean-text%2Fopen-korean-text","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopen-korean-text%2Fopen-korean-text/lists"}