{"id":15693037,"url":"https://github.com/code-hex/text-shirasu","last_synced_at":"2025-06-29T07:37:44.703Z","repository":{"id":56834665,"uuid":"67207435","full_name":"Code-Hex/Text-Shirasu","owner":"Code-Hex","description":"Wrapped Text::MeCab in Perl","archived":false,"fork":false,"pushed_at":"2017-06-08T11:20:10.000Z","size":71,"stargazers_count":5,"open_issues_count":1,"forks_count":2,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-05-12T19:21:27.917Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Perl","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Code-Hex.png","metadata":{"files":{"readme":"README.md","changelog":"Changes","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-09-02T08:56:58.000Z","updated_at":"2019-06-11T13:33:37.000Z","dependencies_parsed_at":"2022-09-09T21:10:24.128Z","dependency_job_id":null,"html_url":"https://github.com/Code-Hex/Text-Shirasu","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/Code-Hex/Text-Shirasu","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Code-Hex%2FText-Shirasu","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Code-Hex%2FText-Shirasu/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Code-Hex%2FText-Shirasu/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Code-Hex%2FText-Shirasu/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Code-Hex","download_url":"https://codeload.github.com/Code-Hex/Text-Shirasu/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Code-Hex%2FText-Shirasu/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262558527,"owners_count":23328549,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-03T18:41:01.115Z","updated_at":"2025-06-29T07:37:44.673Z","avatar_url":"https://github.com/Code-Hex.png","language":"Perl","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Build Status](https://travis-ci.org/Code-Hex/Text-Shirasu.svg?branch=master)](https://travis-ci.org/Code-Hex/Text-Shirasu) [![MetaCPAN Release](https://badge.fury.io/pl/Text-Shirasu.svg)](https://metacpan.org/release/Text-Shirasu)\n# NAME\n\nText::Shirasu - Text::MeCab wrapped for natural language processing \n\n# SYNOPSIS\n\n    use utf8;\n    use feature ':5.10';\n    use Text::Shirasu;\n    my $ts = Text::Shirasu-\u003enew(cabocha =\u003e 1); # you can use Text::CaboCha\n    my $normalize = $ts-\u003enormalize(\"昨日の晩御飯は「鮭のふりかけ」と「味噌汁」だけでした。\");\n    $ts-\u003eparse($normalize);\n\n    for my $node (@{ $ts-\u003enodes }) {\n        say $node-\u003esurface;\n    }\n\n    say $ts-\u003ejoin_surface;\n\n    my $filter = $ts-\u003efilter(type =\u003e [qw/名詞 助動詞/], 記号 =\u003e [qw/括弧開 括弧閉/]);\n    say $filter-\u003ejoin_surface;\n\n    for my $tree (@{ $ts-\u003etrees }) {\n        say $tree-\u003esurface;\n    }\n\n# DESCRIPTION\n\nText::Shirasu is wrapped [Text::MeCab](https://metacpan.org/pod/Text::MeCab).  \nThis module is easy to normalize text and filter part of speech.  \nAlso to use [Text::CaboCha](https://metacpan.org/pod/Text::CaboCha) by setting the cabocha option to true.\n\n# METHODS\n\n## new\n\n    Text::Shirasu-\u003enew(\n        # If you want to use cabocha\n        cabocha =\u003e 1,\n        # Text::MeCab arguments\n        rcfile             =\u003e $rcfile,             # Also it will be ailias as mecabrc for Text::CaboCha\n        dicdir             =\u003e $dicdir,             # Also it will be ailias as mecab_dicdir for Text::CaboCha\n        userdic            =\u003e $userdic,            # Also it will be ailias as mecab_userdic for Text::CaboCha\n        lattice_level      =\u003e $lattice_level,\n        all_morphs         =\u003e $all_morphs,\n        output_format_type =\u003e $output_format_type,\n        partial            =\u003e $partial,\n        node_format        =\u003e $node_format,\n        unk_format         =\u003e $unk_format,\n        bos_format         =\u003e $bos_format,\n        eos_format         =\u003e $eos_format,\n        input_buffer_size  =\u003e $input_buffer_size,\n        allocate_sentence  =\u003e $allocate_sentence,\n        nbest              =\u003e $nbest,\n        theta              =\u003e $theta,\n        \n        # Text::CaboCha arguments\n        ne            =\u003e $ne,\n        parser_model  =\u003e $parser_model_file,\n        chunker_model =\u003e $chunker_model_file,\n        ne_model      =\u003e $ne_tagger_model_file,\n    );\n\n## parse\n\nThis method wraps the parse method of Text::MeCab.\nThe analysis result is saved as array reference of Text::Shirasu::Node instance in the Text::Shirasu instance.\nAlso, If you used cabocha mode, it save as array reference of Text::Shirasu::Tree instance in the Text::Shirasu instance when used this method.\nIt return Text::Shirasu instance. \n\n    $ts-\u003eparse(\"このおにぎりは「母」が握ってくれたものです。\");\n\n## normalize\n\nIt will normalize text using [Lingua::JA::NormalizeText](https://metacpan.org/pod/Lingua::JA::NormalizeText).  \n\n    $ts-\u003enormalize(\"あ━ ”（＊）” を〰〰 ’＋１’\")\n    $ts-\u003enormalize(\"テキスト〰〰\", qw/nfkc, alnum_z2h/, \\\u0026your_create_routine)\n\nIt accepts a string as the first argument, and receives the Lingua::JA::NormalizeText options and subroutines after the second argument.\nIf you do not specify a subroutine to be used in normalization, use the following Lingua::JA::NormalizeText options and subroutines by default.  \n\nPlease read the documentation of [Lingua::JA::NormalizeText](https://metacpan.org/pod/Lingua::JA::NormalizeText) for details on how each Lingua::JA::NormalizeText option works.\n\nLingua::JA::NormalizeText options\n\n`nfkc nfkd nfc nfd alnum_z2h space_z2h katakana_h2z decode_entities unify_nl unify_whitespaces unify_long_spaces trim old2new_kana old2new_kanji tab2space all_dakuon_normalize square2katakana circled2kana circled2kanji decompose_parenthesized_kanji`\n\nSubroutines\n\n`normalize_hyphen normalize_symbols`\n\n## filter\n\nPlease use after parse method execution.   \nFilter the surface based on the features stored in the Text::Shirasu instance.\nPassing subtype to value with part of speech name as key allows you to more filter the string.\n\n    # filtering nodes only\n    $ts-\u003efilter(type =\u003e [qw/名詞/]);\n    $ts-\u003efilter(type =\u003e [qw/名詞 記号/], 記号 =\u003e [qw/括弧開 括弧閉/]);\n\n    # filtering trees only\n    $ts-\u003efilter(tree =\u003e 1, node =\u003e 0, type =\u003e [qw/名詞/]);\n    $ts-\u003efilter(tree =\u003e 1, node =\u003e 0, type =\u003e [qw/名詞 記号/], 記号 =\u003e [qw/括弧開 括弧閉/]);\n\n    # filtering nodes and trees\n    $ts-\u003efilter(tree =\u003e 1, type =\u003e [qw/名詞/]);\n    $ts-\u003efilter(tree =\u003e 1, type =\u003e [qw/名詞 記号/], 記号 =\u003e [qw/括弧開 括弧閉/]);\n\n## join\\_surface\n\nReturns a string that combined the surfaces stored in the instance.\n\n    $ts-\u003ejoin_surface\n\n## nodes\n\nReturn the array reference of the Text::Shirasu::Node instance.\n\n    $ts-\u003enodes\n\n## trees\n\nReturn the array reference of the Text::Shirasu::Tree instance.\n\n    $ts-\u003etrees\n\n## mecab\n\nReturn the Text::MeCab instance.\n\n    $ts-\u003emecab\n\n## cabocha\n\nReturn the Text::CaboCha instance.\n\n    $ts-\u003ecabocha\n\n# SUBROUTINES\n\nThese subroutines perform the following substitution.  \n\n## normalize\\_hyphen\n\n    s/[˗֊‐‑‒–⁃⁻₋−]/-/g;\n    s/[﹣－ｰ—―─━ー]/ー/g;\n    s/[~∼∾〜〰～]//g;\n    s/ー+/ー/g;\n\n## normalize\\_symbols\n\n    tr/。、・「」/｡､･｢｣/;\n\n# LICENSE\n\nCopyright (C) Kei Kamikawa(Code-Hex).\n\nThis library is free software; you can redistribute it and/or modify\nit under the same terms as Perl itself.\n\n# AUTHOR\n\nKei Kamikawa \u003cx00.x7f@gmail.com\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcode-hex%2Ftext-shirasu","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcode-hex%2Ftext-shirasu","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcode-hex%2Ftext-shirasu/lists"}