{"id":37065575,"url":"https://github.com/monpa-team/monpa","last_synced_at":"2026-01-14T07:40:37.636Z","repository":{"id":46097716,"uuid":"198371775","full_name":"monpa-team/monpa","owner":"monpa-team","description":"MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型","archived":false,"fork":false,"pushed_at":"2025-02-20T02:31:56.000Z","size":8653,"stargazers_count":247,"open_issues_count":0,"forks_count":25,"subscribers_count":22,"default_branch":"master","last_synced_at":"2025-08-26T00:56:07.648Z","etag":null,"topics":["albert","bert","chinese-word-segmentation","named-entity-recognition","ner","nlp","pos","pos-tagging","word-segmentation"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/monpa-team.png","metadata":{"files":{"readme":"README.md","changelog":"changelog.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-07-23T07:01:02.000Z","updated_at":"2025-08-14T12:31:38.000Z","dependencies_parsed_at":"2022-09-26T17:21:30.773Z","dependency_job_id":null,"html_url":"https://github.com/monpa-team/monpa","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/monpa-team/monpa","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/monpa-team%2Fmonpa","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/monpa-team%2Fmonpa/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/monpa-team%2Fmonpa/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/monpa-team%2Fmonpa/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/monpa-team","download_url":"https://codeload.github.com/monpa-team/monpa/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/monpa-team%2Fmonpa/sbom","scorecard":{"id":658917,"data":{"date":"2025-08-11","repo":{"name":"github.com/monpa-team/monpa","commit":"5aa8836b5bdd43c3d743a89ca125e6e8ef4c4a48"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3,"checks":[{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Code-Review","score":0,"reason":"Found 0/30 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"SAST","score":0,"reason":"no SAST tool detected","details":["Warn: no pull requests merged into dev branch"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"License","score":9,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Warn: project license file does not contain an FSF or OSI license."],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}}]},"last_synced_at":"2025-08-21T15:30:21.924Z","repository_id":46097716,"created_at":"2025-08-21T15:30:21.925Z","updated_at":"2025-08-21T15:30:21.925Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28413470,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-14T05:26:33.345Z","status":"ssl_error","status_checked_at":"2026-01-14T05:21:57.251Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["albert","bert","chinese-word-segmentation","named-entity-recognition","ner","nlp","pos","pos-tagging","word-segmentation"],"created_at":"2026-01-14T07:40:37.107Z","updated_at":"2026-01-14T07:40:37.617Z","avatar_url":"https://github.com/monpa-team.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 罔拍 MONPA: Multi-Objective NER POS Annotator\n\nMONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型，本計劃是將 monpa 包裝成可以 pip install 的 python package (最新版本 v0.3.1)。網站示範版本（\u003chttps://nlptmu-monpaweb.hf.space/\u003e）。\n\n最新版的 monpa model 是使用 pytorch 1.0 框架訓練出來的模型，所以在使用本版本前，請先安裝 torch 1.* 以上版本才能正常使用 monpa 套件。\n\n## 公告\n```diff\n- 本次更新版本 v0.3.3：#16 升級 torch API 以解決警告訊息\n- 更新版本 v0.3.2：解決 issue 10, 11 的建議，新增 short_sentence 斷句功能, cut_mp 及 cut_pseg 多執行程序功能等輔助程式。\n- 更新版本 v0.3.1：新增運用 GPU 的批次斷詞功能 cut_batch 及 pseg_batch。\n- 版本 v0.3.0：更小，更快，依然準確。完成 pip install 後不需要再另行下載模型檔。\n- 公開釋出的 MONPA 僅供學術使用，請勿使用於商業用途。本團隊亦提供針對專業領域客製模型之服務，歡迎聯絡我們。\n```\n\nMONPA v0.2+ 版本是基於 BERT（雙向 Transformer）[[1]](#1)模型來取得更強健的詞向量（word embeddings）並配合 CRF 同時進行斷詞、詞性標註、及 NER 等多個目標。已與 MONPA v0.1 版本有相當大差異，訓練語料亦與論文內容不同。\n\nMONPA v0.3+ 版本基於 ALBERT [[2]](#2) 重新訓練，大幅降低模型檔的大小，並加快執行效率。\n\n\u003ca id=\"1\"\u003e[1]\u003c/a\u003e  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.\nJacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, NAACL-HLT 2019.\n\n\u003ca id=\"2\"\u003e[2]\u003c/a\u003e  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations.\nZhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut, ICLR 2020.\n\n**開發正體中文自然語言斷詞套件是一個基礎，接續的研究需要多方支持，歡迎[您的捐款](http://nlp.tmu.edu.tw/Donate/index.html)。**\n\nmonpa 各版本的斷詞效率比較圖\n\n\u003cimg src=\"./monpa_2vs3.png\" style=\"zoom:24%;\" /\u003e\n\n以上於 Google Colab 環境測試(monpa.cut 皆使用 CPU，monpa.cut_batch 使用 GPU)\n\n\n**注意**\n\n1. 建議以原文輸入 monpa 完成斷詞後，再視需求濾掉停留字（stopword）及標點符號（punctuation）。\n2. 每次輸入到 monpa 做斷詞的原文超過 200 字元的部分將被截斷丟失，建議先完成合適長度分句後再應用 monpa 斷詞。可參考 wiki [如何將長文切成短句再用 monpa 斷詞？](https://github.com/monpa-team/monpa/wiki/Example-1：將長句處理成短句再運用-monpa-完成分詞)）自行開發或是使用 v0.3.2 （含）之後版本的功能程式 short_sentence 來協助分句。\n3. 支援 python \u003e= 3.6，不支援 python 2.x。\n\n## 安裝 monpa 套件\n\nmonpa 已經支援使用 pip 指令安裝，各作業系統的安裝步驟都相同。\n\n```bash\npip install monpa\n```\n\n安裝時將自動檢查有無 torch \u003e= 1.0 及 requests 等套件，若無則由 pip 直接安裝。Windows 作業系統需手動安裝，建議移駕 [pytorch.org](https://www.pytorch.org) 取得最適合作業系統版本的安裝指令。\n\n*若已經安裝 monpa v0.2.x 版本，可以``` pip install --upgrade monpa```直接升級或是先以```pip uninstall monpa``` 指令移除舊版本再行安裝新版本。*\n\n## 使用 monpa 的簡單範例\n\n引入 monpa 的 python package。\n\n```python\nimport monpa\n```\n\n### cut function\n\n若只需要中文斷詞結果，請用 ```cut``` function，回傳值是 list 格式。簡單範例如下：\n\n```python\nsentence = \"蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。\"\nresult_cut = monpa.cut(sentence)\n\nfor item in result_cut:\n    print(item)\n```\n\n輸出\n\n```python\n蔡英文\n總統\n今天\n受\n邀\n參加\n台北市政府\n所\n舉辦\n的\n陽明山\n馬拉松\n比賽\n。\n```\n\n### pseg function\n\n若需要中文斷詞及其 POS 結果，請用 ```pseg``` function，回傳值是 list of tuples 格式，簡單範例如下：\n\n```python\nsentence = \"蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。\"\nresult_pseg = monpa.pseg(sentence)\n\nfor item in result_pseg:\n    print(item)\n```\n\n輸出\n\n```python\n('蔡英文', 'PER')\n('總統', 'Na')\n('今天', 'Nd')\n('受', 'P')\n('邀', 'VF')\n('參加', 'VC')\n('台北市政府', 'ORG')\n('所', 'D')\n('舉辦', 'VC')\n('的', 'DE')\n('陽明山', 'LOC')\n('馬拉松', 'Na')\n('比賽', 'Na')\n('。', 'PERIODCATEGORY')\n```\n\n### load_userdict function\n\n如果需要自訂詞典，請依下列格式製作詞典文字檔，再使用此功能載入。簡單範例如下：\n\n假設製作一個 userdict.txt 檔，每行含三部分，必須用「空格 （space）」隔開，依次是：詞語、詞頻（數值型態）、詞性（未能確定，請填 ```NER```）。排序是以詞頻數值大者優先，若詞頻數值相同則排列前面者優先。\n\n**注意：最後不要留空行或任何空白空間。***\n\n```reStructuredText\n台北市政府 100 NER\n受邀 100 V\n```\n\n當要使用自訂詞時，請於執行斷詞前先做 ```load_userdict```，將自訂詞典載入到 monpa 模組。\n\n請將本範例的 ```./userdict.txt``` 改成實際放置自訂詞文字檔路徑及檔名。\n\n```python\nmonpa.load_userdict(\"./userdict.txt\")\n```\n\n延用前例，用 ```pseg``` function，可發現回傳值已依自訂詞典斷詞，譬如「受邀」為一個詞而非先前的兩字分列輸出，「台北市政府」也依自訂詞輸出。\n\n```python\nsentence = \"蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。\"\nresult_pseg_userdict = monpa.pseg(sentence)\n\nfor item in result_pseg_userdict:\n    print(item)\n```\n\n輸出\n\n```python\n('蔡英文', 'PER')\n('總統', 'Na')\n('今天', 'Nd')\n('受邀', 'V')\n('參加', 'VC')\n('台北市政府', 'NER')\n('所', 'D')\n('舉辦', 'VC')\n('的', 'DE')\n('陽明山', 'LOC')\n('馬拉松', 'Na')\n('比賽', 'Na')\n('。', 'PERIODCATEGORY')\n```\n### cut_batch function\n\n開始批次斷句前，請先啟動使用 GPU 之設定。\n\n```python\nmonpa.use_gpu(True)\n```\n\n從 monpa v0.3.1 開始提供應用 GPU 運算能力的 ```cut_batch``` function，輸入須為 list 格式，單批次的輸入量需考量 GPU 的記憶體容量，回傳值亦是 list 格式。初次啟動需耗費較多時間，建議若非大量斷詞，可使用 ```cut``` function 即可。簡單範例如下：\n\n```python\nsentence_list = [\"蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。\", \"蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。\"]\nresult_cut_batch = monpa.cut_batch(sentence_list)\n\nfor item in result_cut_batch:\n    print(item)\n```\n\n輸出\n\n```python\n['蔡英文', '總統', '今天', '受', '邀', '參加', '台北市政府', '所', '舉辦', '的', '陽明山', '馬拉松', '比賽', '。']\n['蔡英文', '總統', '今天', '受', '邀', '參加', '台北市政府', '所', '舉辦', '的', '陽明山', '馬拉松', '比賽', '。']\n```\n\n### pseg_batch function\n\n開始批次斷句前，請先啟動使用 GPU 之設定。\n\n```python\nmonpa.use_gpu(True)\n```\n\n從 monpa v0.3.1 開始提供應用 GPU 運算能力的 ```pseg_batch``` function，輸入須為 list 格式，單批次的輸入量需考量 GPU 的記憶體容量，回傳值亦是 list of turples 格式。初次啟動需耗費較多時間，建議若非大量斷詞，可使用 ```pseg``` function 即可。簡單範例如下：\n\n```python\nsentence_list = [\"蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。\", \"蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。\"]\nresult_pseg_batch = monpa.pseg_batch(sentence_list)\n\nfor item in result_pseg_batch:\n    print(item)\n```\n\n輸出\n\n```python\n[('蔡英文', 'PER'), ('總統', 'Na'), ('今天', 'Nd'), ('受', 'P'), ('邀', 'VF'), ('參加', 'VC'), ('台北市政府', 'ORG'), ('所', 'D'), ('舉辦', 'VC'), ('的', 'DE'), ('陽明山', 'LOC'), ('馬拉松', 'Na'), ('比賽', 'Na'), ('。', 'PERIODCATEGORY')]\n[('蔡英文', 'PER'), ('總統', 'Na'), ('今天', 'Nd'), ('受', 'P'), ('邀', 'VF'), ('參加', 'VC'), ('台北市政府', 'ORG'), ('所', 'D'), ('舉辦', 'VC'), ('的', 'DE'), ('陽明山', 'LOC'), ('馬拉松', 'Na'), ('比賽', 'Na'), ('。', 'PERIODCATEGORY')]\n```\n\n## 輔助功能程式（ v0.3.2 開始提供）\n### utils.short_sentence function\n\n開始使用輔助功能程式前，請先載入 monpa 附屬之 utils 功能。\n\n```python\nfrom monpa import utils\n```\n\n基於 monpa 斷詞只處理 200 字元內的短句，所以建議先將長句分成多個短句再做斷詞才不會因過長語句而丟失斷詞。從 monpa v0.3.2 開始提供以 \"。\"，\"！\"，\"？\"，\"，\" 依序為參考斷點的 ```short_sentence``` function，輸入須為 string 格式，回傳值是 list 格式。該功能程式將先尋找 200 字元內最後一個 \"。\" 為斷點，若無，則改以 \"！\" 為斷點，以此類推。若 200 字元內皆無法找到預設 4 個標點符號為斷點來分句，就直接從 200 字元處分句。簡單範例如下：\n\n```python\nlong_sentence = '''\n蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。\n'''\nsentence_list = utils.short_sentence(long_sentence)\nfor item in sentence_list:\n    print(item)\n```\n\n輸出\n\n可以發現有 292 字元的 ```long_sentence``` 長句，經 ```utils.short_sentence``` 以 \"。\" 為斷點分成兩個短句。\n```python\n蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。\n蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。\n```\n\n### utils.cut_mp function\n\n從 monpa v0.3.1 開始提供應用 GPU 運算能力的 ```cut_batch``` function，但考量不是每台機器皆有 GPU，所以從 v0.3.2 開始提供多執行程序的功能程式來降低多量句子的斷詞耗時。輸入為 list 或是 list of list 格式，再依機器的 CPU 內核配備指定同時啟動的 worker 數量，回傳值是 list  或是 list of list 格式。初次啟動需耗費較多時間，建議若非大量斷詞，可使用 ```cut``` function 即可。簡單範例如下：\n\n```python\nsentence_list = ['蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。', '蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。']\n\nresult_cut_mp = utils.cut_mp(sentence_list, 4) #本例是指定啟動 4 個 workers\nprint(result_cut_mp)\n```\n\n輸出\n\n```python\n[['蔡英文', '總統', '今天', '受', '邀', '參加', '台北市政府', '所', '舉辦', '的', '陽明山', '馬拉松', '比賽', '。', '蔡英文', '總統', '今天', '受', '邀', '參加', '台北市', '政府', '所', '舉辦', '的', '陽明山', '馬拉松', '比賽', '。', '蔡英文', '總統', '今天', '受', '邀', '參加', '台北市', '政府', '所', '舉辦', '的', '陽明山', '馬拉松', '比賽', '。', '蔡英文', '總統', '今天', '受', '邀', '參加', '台北市', '政府', '所', '舉辦', '的', '陽明山', '馬拉松', '比賽', '。', '蔡英文', '總統', '今天', '受邀', '參加', '台北市', '政府', '所', '舉辦', '的', '陽明山', '馬拉松', '比賽', '。', '蔡英文', '總統', '今天', '受', '邀', '參加', '台北市', '政府', '所', '舉辦', '的', '陽明山', '馬拉松', '比賽', '。'], ['蔡英文', '總統', '今天', '受', '邀', '參加', '台北市政府', '所', '舉辦', '的', '陽明山', '馬拉松', '比賽', '。', '蔡英文', '總統', '今天', '受', '邀', '參加', '台北市政府', '所', '舉辦', '的', '陽明山', '馬拉松', '比賽', '。', '蔡英文', '總統', '今天', '受', '邀', '參加', '台北市', '政府', '所', '舉辦', '的', '陽明山', '馬拉松', '比賽', '。', '蔡英文', '總統', '今天', '受', '邀', '參加', '台北市', '政府', '所', '舉辦', '的', '陽明山', '馬拉松', '比賽', '。']]\n```\n\n### utils.pseg_mp function\n\n從 monpa v0.3.1 開始提供應用 GPU 運算能力的 ```cut_batch``` function，但考量不是每台機器皆有 GPU，所以從 v0.3.2 開始提供多執行程序的功能程式來降低多量句子的斷詞耗時。輸入為 list 或是 list of list 格式，再依機器的 CPU 內核配備指定同時啟動的 worker 數量，回傳值是 list  或是 list of list 格式。初次啟動需耗費較多時間，建議若非大量斷詞，可使用 ```pseg``` function 即可。簡單範例如下：\n\n```python\nsentence_list = ['蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。', '蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。蔡英文總統今天受邀參加台北市政府所舉辦的陽明山馬拉松比賽。']\n\nresult_pseg_mp = utils.pseg_mp(sentence_list, 4) #本例是指定啟動 4 個 workers\nprint(result_pseg_mp)\n```\n\n輸出\n\n```python\n[[('蔡英文', 'PER'), ('總統', 'Na'), ('今天', 'Nd'), ('受', 'P'), ('邀', 'VF'), ('參加', 'VC'), ('台北市政府', 'ORG'), ('所', 'D'), ('舉辦', 'VC'), ('的', 'DE'), ('陽明山', 'LOC'), ('馬拉松', 'Na'), ('比賽', 'Na'), ('。', 'PERIODCATEGORY'), ('蔡英文', 'PER'), ('總統', 'Na'), ('今天', 'Nd'), ('受', 'VJ'), ('邀', 'VF'), ('參加', 'VC'), ('台北市', 'LOC'), ('政府', 'Na'), ('所', 'D'), ('舉辦', 'VC'), ('的', 'DE'), ('陽明山', 'LOC'), ('馬拉松', 'Na'), ('比賽', 'Na'), ('。', 'PERIODCATEGORY'), ('蔡英文', 'PER'), ('總統', 'Na'), ('今天', 'Nd'), ('受', 'VJ'), ('邀', 'VF'), ('參加', 'VC'), ('台北市', 'LOC'), ('政府', 'Na'), ('所', 'D'), ('舉辦', 'VC'), ('的', 'DE'), ('陽明山', 'LOC'), ('馬拉松', 'Na'), ('比賽', 'Na'), ('。', 'PERIODCATEGORY'), ('蔡英文', 'PER'), ('總統', 'Na'), ('今天', 'Nd'), ('受', 'VJ'), ('邀', 'VF'), ('參加', 'VC'), ('台北市', 'LOC'), ('政府', 'Nc'), ('所', 'D'), ('舉辦', 'VC'), ('的', 'DE'), ('陽明山', 'LOC'), ('馬拉松', 'Na'), ('比賽', 'Na'), ('。', 'PERIODCATEGORY'), ('蔡英文', 'PER'), ('總統', 'Na'), ('今天', 'Nd'), ('受邀', 'VJ'), ('參加', 'VC'), ('台北市', 'LOC'), ('政府', 'Nc'), ('所', 'D'), ('舉辦', 'VC'), ('的', 'DE'), ('陽明山', 'LOC'), ('馬拉松', 'Na'), ('比賽', 'Na'), ('。', 'PERIODCATEGORY'), ('蔡英文', 'PER'), ('總統', 'Na'), ('今天', 'Nd'), ('受', 'VJ'), ('邀', 'VF'), ('參加', 'VC'), ('台北市', 'LOC'), ('政府', 'Na'), ('所', 'D'), ('舉辦', 'VC'), ('的', 'DE'), ('陽明山', 'LOC'), ('馬拉松', 'Na'), ('比賽', 'Na'), ('。', 'PERIODCATEGORY')], [('蔡英文', 'PER'), ('總統', 'Na'), ('今天', 'Nd'), ('受', 'P'), ('邀', 'VF'), ('參加', 'VC'), ('台北市政府', 'ORG'), ('所', 'D'), ('舉辦', 'VC'), ('的', 'DE'), ('陽明山', 'LOC'), ('馬拉松', 'Na'), ('比賽', 'Na'), ('。', 'PERIODCATEGORY'), ('蔡英文', 'PER'), ('總統', 'Na'), ('今天', 'Nd'), ('受', 'P'), ('邀', 'VF'), ('參加', 'VC'), ('台北市政府', 'ORG'), ('所', 'D'), ('舉辦', 'VC'), ('的', 'DE'), ('陽明山', 'LOC'), ('馬拉松', 'Na'), ('比賽', 'Na'), ('。', 'PERIODCATEGORY'), ('蔡英文', 'PER'), ('總統', 'Na'), ('今天', 'Nd'), ('受', 'VJ'), ('邀', 'VF'), ('參加', 'VC'), ('台北市', 'LOC'), ('政府', 'Na'), ('所', 'D'), ('舉辦', 'VC'), ('的', 'DE'), ('陽明山', 'LOC'), ('馬拉松', 'Na'), ('比賽', 'Na'), ('。', 'PERIODCATEGORY'), ('蔡英文', 'PER'), ('總統', 'Na'), ('今天', 'Nd'), ('受', 'VJ'), ('邀', 'VF'), ('參加', 'VC'), ('台北市', 'LOC'), ('政府', 'Nc'), ('所', 'D'), ('舉辦', 'VC'), ('的', 'DE'), ('陽明山', 'LOC'), ('馬拉松', 'Na'), ('比賽', 'Na'), ('。', 'PERIODCATEGORY')]]\n```\n\n## 捐款\n\n我們需要您的支持來延續開發自然語言的基礎設施程式，懇請捐款[臺北醫學大學自然語言處理實驗室『人工智慧卓越創新計畫』。](http://nlp.tmu.edu.tw/Donate/index.html)\n\n## 其他\n\nThis project is inspired by our paper [MONPA: Multi-objective Named-entity and Part-of-speech Annotator for Chinese using Recurrent Neural Network](https://www.aclweb.org/anthology/papers/I/I17/I17-2014/) in which more information about the model detail can be found. \n\nFor your reference, although we list the paper here, it does NOT mean we use the exact same corpora when training the released model. The current MONPA is a new development by adopting the (AL)BERT model and a new paper will be published later. In the meantime, we list the original paper about the core ideas of MONPA for citation purposes.\n\n##### Abstract\n\nPart-of-speech (POS) tagging and named entity recognition (NER) are crucial steps in natural language processing. In addition, the difficulty of word segmentation places additional burden on those who intend to deal with languages such as Chinese, and pipelined systems often suffer from error propagation. This work proposes an end-to-end model using character-based recurrent neural network (RNN) to jointly accomplish segmentation, POS tagging and NER of a Chinese sentence. Experiments on previous word segmentation and NER datasets show that a single model with the proposed architecture is comparable to those trained specifically for each task, and outperforms freely-available softwares. Moreover, we provide a web-based interface for the public to easily access this resource.\n\n#### Citation:\n\n##### APA:\n\nHsieh, Y. L., Chang, Y. C., Huang, Y. J., Yeh, S. H., Chen, C. H., \u0026 Hsu, W. L. (2017, November). MONPA: Multi-objective Named-entity and Part-of-speech Annotator for Chinese using Recurrent Neural Network. In *Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)* (pp. 80-85).\n\n##### BibTex\n\n```text\n@inproceedings{hsieh-etal-2017-monpa,\n    title = \"{MONPA}: Multi-objective Named-entity and Part-of-speech Annotator for {C}hinese using Recurrent Neural Network\",\n    author = \"Hsieh, Yu-Lun  and\n      Chang, Yung-Chun  and\n      Huang, Yi-Jie  and\n      Yeh, Shu-Hao  and\n      Chen, Chun-Hung  and\n      Hsu, Wen-Lian\",\n    booktitle = \"Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)\",\n    month = nov,\n    year = \"2017\",\n    address = \"Taipei, Taiwan\",\n    publisher = \"Asian Federation of Natural Language Processing\",\n    url = \"https://www.aclweb.org/anthology/I17-2014\",\n    pages = \"80--85\",\n    abstract = \"Part-of-speech (POS) tagging and named entity recognition (NER) are crucial steps in natural language processing. In addition, the difficulty of word segmentation places additional burden on those who intend to deal with languages such as Chinese, and pipelined systems often suffer from error propagation. This work proposes an end-to-end model using character-based recurrent neural network (RNN) to jointly accomplish segmentation, POS tagging and NER of a Chinese sentence. Experiments on previous word segmentation and NER datasets show that a single model with the proposed architecture is comparable to those trained specifically for each task, and outperforms freely-available softwares. Moreover, we provide a web-based interface for the public to easily access this resource.\",\n}\n```\n\n##### Contact\nPlease feel free to contact monpa team by email.\nmonpa.cut@gmail.com\n\n## 致謝\n\n茲因模型開發初期使用中央研究院中文詞知識庫小組開發之 CKIP 程式進行部分語料標註工作，後再經其他程序完成標註校正，感謝中央研究院中文詞知識庫小組的協助。MONPA 於經中央研究院中文詞知識庫小組同意下，使用 CKIP 斷詞元件輔助製作初期訓練資料。\n\nMa, Wei-Yun and Keh-Jiann Chen, 2003, \"Introduction to CKIP Chinese Word Segmentation System for the First International Chinese Word Segmentation Bakeoff\", Proceedings of ACL, Second SIGHAN Workshop on Chinese Language Processing, pp168-171.。\n\n## License\n\n[![CC BY-NC-SA 4.0](https://camo.githubusercontent.com/6887feb0136db5156c4f4146e3dd2681d06d9c75/68747470733a2f2f692e6372656174697665636f6d6d6f6e732e6f72672f6c2f62792d6e632d73612f342e302f38387833312e706e67)](http://creativecommons.org/licenses/by-nc-sa/4.0/)\n\nCopyright (c) 2020 The MONPA team under the [CC-BY-NC-SA 4.0 License](http://creativecommons.org/licenses/by-nc-sa/4.0/). All rights reserved.\n\n僅供學術使用，請勿使用於營利目的。若您需要應用 MONPA 於商業用途，請聯繫我們協助後續事宜。（monpa.cut@gmail.com）\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmonpa-team%2Fmonpa","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmonpa-team%2Fmonpa","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmonpa-team%2Fmonpa/lists"}