{"id":19466147,"url":"https://github.com/yoshoku/suika","last_synced_at":"2025-04-05T21:07:26.277Z","repository":{"id":47550939,"uuid":"276038705","full_name":"yoshoku/suika","owner":"yoshoku","description":"Suika 🍉 is a Japanese morphological analyzer written in pure Ruby","archived":false,"fork":false,"pushed_at":"2025-01-01T12:16:39.000Z","size":38009,"stargazers_count":48,"open_issues_count":0,"forks_count":1,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-29T20:05:10.005Z","etag":null,"topics":["morphological-analysis","nlp","postagger","ruby","tokenizer"],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yoshoku.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-06-30T08:14:50.000Z","updated_at":"2025-02-23T22:58:27.000Z","dependencies_parsed_at":"2024-11-10T18:28:24.752Z","dependency_job_id":"fb3e61ee-f7dc-4174-a09b-690f23cbee49","html_url":"https://github.com/yoshoku/suika","commit_stats":{"total_commits":83,"total_committers":1,"mean_commits":83.0,"dds":0.0,"last_synced_commit":"99635afcbd87ced45301769c1b4b9cb122138cf4"},"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yoshoku%2Fsuika","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yoshoku%2Fsuika/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yoshoku%2Fsuika/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yoshoku%2Fsuika/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yoshoku","download_url":"https://codeload.github.com/yoshoku/suika/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247399877,"owners_count":20932876,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["morphological-analysis","nlp","postagger","ruby","tokenizer"],"created_at":"2024-11-10T18:25:51.387Z","updated_at":"2025-04-05T21:07:26.258Z","avatar_url":"https://github.com/yoshoku.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Suika\n\n[![Build Status](https://github.com/yoshoku/suika/workflows/build/badge.svg)](https://github.com/yoshoku/suika/actions?query=workflow%3Abuild)\n[![Gem Version](https://badge.fury.io/rb/suika.svg)](https://badge.fury.io/rb/suika)\n[![BSD 3-Clause License](https://img.shields.io/badge/License-BSD%203--Clause-orange.svg)](https://github.com/yoshoku/suika/blob/main/LICENSE.txt)\n[![Documentation](https://img.shields.io/badge/api-reference-blue.svg)](https://rubydoc.info/gems/suika)\n\nSuika 🍉 is a Japanese morphological analyzer written in pure Ruby.\n\n## Installation\n\nAdd this line to your application's Gemfile:\n\n```ruby\ngem 'suika'\n```\n\nAnd then execute:\n\n    $ bundle install\n\nOr install it yourself as:\n\n    $ gem install suika\n\n## Usage\n\n```ruby\nrequire 'suika'\n\ntagger = Suika::Tagger.new\ntagger.parse('すもももももももものうち').each { |token| puts token }\n\n# すもも  名詞,一般,*,*,*,*,すもも,スモモ,スモモ\n# も      助詞,係助詞,*,*,*,*,も,モ,モ\n# もも    名詞,一般,*,*,*,*,もも,モモ,モモ\n# も      助詞,係助詞,*,*,*,*,も,モ,モ\n# もも    名詞,一般,*,*,*,*,もも,モモ,モモ\n# の      助詞,連体化,*,*,*,*,の,ノ,ノ\n# うち    名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ\n```\n\nSince the Tagger class loads the binary dictionary at initialization, it is recommended to reuse the instance.\n\n```ruby\ntagger = Suika::Tagger.new\n\nsentences.each do |sentence|\n  result = tagger.parse(sentence)\n\n  # ...\nend\n```\n\n## Test\nSuika was able to parse all sentences in the [Livedoor news corpus](https://www.rondhuit.com/download.html#ldcc)\nwithout any error.\n\n```ruby\nrequire 'suika'\n\ntagger = Suika::Tagger.new\n\nDir.glob('ldcc-20140209/text/*/*.txt').each do |filename|\n  File.foreach(filename) do |sentence|\n    sentence.strip!\n    puts tagger.parse(sentence) unless sentence.empty?\n  end\nend\n```\n\n![suika_test](https://user-images.githubusercontent.com/5562409/90264778-8f593f80-de8c-11ea-81f1-20831e3c8b12.gif)\n\n## Contributing\n\nBug reports and pull requests are welcome on GitHub at https://github.com/yoshoku/suika.\nThis project is intended to be a safe, welcoming space for collaboration,\nand contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct.\n\n## License\n\nThe gem is available as open source under the terms of the [BSD-3-Clause License](https://opensource.org/licenses/BSD-3-Clause).\nIn addition, the gem includes binary data generated from mecab-ipadic.\nThe details of the license can be found in [LICENSE.txt](https://github.com/yoshoku/suika/blob/main/LICENSE.txt)\nand [NOTICE.txt](https://github.com/yoshoku/suika/blob/main/NOTICE.txt).\n\n## Respect\n\n- [Taku Kudo](https://github.com/taku910) is the author of [MeCab](https://taku910.github.io/mecab/) that is the most famous morphological analyzer in Japan.\nMeCab is one of the great software in natural language processing.\nSuika is created with reference to [the book on morphological analysis](https://www.kindaikagaku.co.jp/information/kd0577.htm) written by Dr. Kudo.\n- [Tomoko Uchida](https://github.com/mocobeta) is the author of [Janome](https://github.com/mocobeta/janome) that is a Japanese morphological analysis engine written in pure Python.\nSuika is heavily influenced by Janome's idea to include the built-in dictionary and language model.\nJanome, a morphological analyzer written in scripting language, gives me the courage to develop Suika.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyoshoku%2Fsuika","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyoshoku%2Fsuika","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyoshoku%2Fsuika/lists"}