{"id":15033111,"url":"https://github.com/crownpku/rasa_nlu_chi","last_synced_at":"2025-05-15T06:07:25.264Z","repository":{"id":39174041,"uuid":"94956993","full_name":"crownpku/Rasa_NLU_Chi","owner":"crownpku","description":"Turn Chinese natural language into structured data 中文自然语言理解","archived":false,"fork":false,"pushed_at":"2024-07-30T21:17:04.000Z","size":2961,"stargazers_count":1524,"open_issues_count":84,"forks_count":421,"subscribers_count":71,"default_branch":"master","last_synced_at":"2025-05-15T06:06:36.775Z","etag":null,"topics":["chatbot","chinese","natural-language"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/crownpku.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.rst","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE.txt","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"crownpku","patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":null}},"created_at":"2017-06-21T03:00:36.000Z","updated_at":"2025-05-08T12:34:27.000Z","dependencies_parsed_at":"2022-07-20T05:00:26.232Z","dependency_job_id":"18200c30-7d44-4851-b33b-e05783d459aa","html_url":"https://github.com/crownpku/Rasa_NLU_Chi","commit_stats":{"total_commits":1792,"total_committers":97,"mean_commits":18.47422680412371,"dds":0.60546875,"last_synced_commit":"f995c06e5aee5b6f68ea877c1a271667357a1c68"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crownpku%2FRasa_NLU_Chi","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crownpku%2FRasa_NLU_Chi/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crownpku%2FRasa_NLU_Chi/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crownpku%2FRasa_NLU_Chi/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/crownpku","download_url":"https://codeload.github.com/crownpku/Rasa_NLU_Chi/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254283350,"owners_count":22045141,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chatbot","chinese","natural-language"],"created_at":"2024-09-24T20:20:08.059Z","updated_at":"2025-05-15T06:07:25.216Z","avatar_url":"https://github.com/crownpku.png","language":"Python","funding_links":["https://github.com/sponsors/crownpku"],"categories":[],"sub_categories":[],"readme":"\n\n# Rasa NLU for Chinese, a fork from RasaHQ/rasa_nlu.\n\n## Please refer to newest instructions at [official Rasa NLU document](https://nlu.rasa.com/)\n\n## [中文Blog](http://www.crownpku.com/2017/07/27/%E7%94%A8Rasa_NLU%E6%9E%84%E5%BB%BA%E8%87%AA%E5%B7%B1%E7%9A%84%E4%B8%AD%E6%96%87NLU%E7%B3%BB%E7%BB%9F.html)\n\n![](http://www.crownpku.com/images/201707/5.jpg)\n![](http://www.crownpku.com/images/201707/4.jpg)\n\n\n\n### Files you should have:\n\n* data/total_word_feature_extractor_zh.dat\n\nTrained from Chinese corpus by MITIE wordrep tools (takes 2-3 days for training)\n\nFor training, please build the [MITIE Wordrep Tool](https://github.com/mit-nlp/MITIE/tree/master/tools/wordrep). Note that Chinese corpus should be tokenized first before feeding into the tool for training. Close-domain corpus that best matches user case works best.\n\nA trained model from Chinese Wikipedia Dump and Baidu Baike can be downloaded from [中文Blog](http://www.crownpku.com/2017/07/27/%E7%94%A8Rasa_NLU%E6%9E%84%E5%BB%BA%E8%87%AA%E5%B7%B1%E7%9A%84%E4%B8%AD%E6%96%87NLU%E7%B3%BB%E7%BB%9F.html).\n\n\n* data/examples/rasa/demo-rasa_zh.json\n\nShould add as much examples as possible.\n\n### Usage:\n\n1. Clone this project, and run\n```\npython setup.py install\n```\n\n2. Modify configuration. \n\n   Currently for Chinese we have two pipelines:\n\n   Use MITIE+Jieba (sample_configs/config_jieba_mitie.yml):\n```yaml\nlanguage: \"zh\"\n\npipeline:\n- name: \"nlp_mitie\"\n  model: \"data/total_word_feature_extractor_zh.dat\"\n- name: \"tokenizer_jieba\"\n- name: \"ner_mitie\"\n- name: \"ner_synonyms\"\n- name: \"intent_entity_featurizer_regex\"\n- name: \"intent_classifier_mitie\"\n```\n\n   RECOMMENDED: Use MITIE+Jieba+sklearn (sample_configs/config_jieba_mitie_sklearn.yml):\n```yaml\nlanguage: \"zh\"\n\npipeline:\n- name: \"nlp_mitie\"\n  model: \"data/total_word_feature_extractor_zh.dat\"\n- name: \"tokenizer_jieba\"\n- name: \"ner_mitie\"\n- name: \"ner_synonyms\"\n- name: \"intent_entity_featurizer_regex\"\n- name: \"intent_featurizer_mitie\"\n- name: \"intent_classifier_sklearn\"\n```\n\n3. (Optional) Use Jieba User Defined Dictionary or Switch Jieba Default Dictionoary:\n\n   You can put in **file path** or **directory path** as the \"user_dicts\" value. (sample_configs/config_jieba_mitie_sklearn_plus_dict_path.yml)\n\n```yaml\nlanguage: \"zh\"\n\npipeline:\n- name: \"nlp_mitie\"\n  model: \"data/total_word_feature_extractor_zh.dat\"\n- name: \"tokenizer_jieba\"\n  default_dict: \"./default_dict.big\"\n  user_dicts: \"./jieba_userdict\"\n#  user_dicts: \"./jieba_userdict/jieba_userdict.txt\"\n- name: \"ner_mitie\"\n- name: \"ner_synonyms\"\n- name: \"intent_entity_featurizer_regex\"\n- name: \"intent_featurizer_mitie\"\n- name: \"intent_classifier_sklearn\"\n```\n\n4. Train model by running:\n\n   If you specify your project name in configure file, this will save your model at /models/your_project_name. \n\n   Otherwise, your model will be saved at /models/default\n\n```\npython -m rasa_nlu.train -c sample_configs/config_jieba_mitie_sklearn.yml --data data/examples/rasa/demo-rasa_zh.json --path models\n```\n\n\n5. Run the rasa_nlu server:\n\n```\npython -m rasa_nlu.server -c sample_configs/config_jieba_mitie_sklearn.yml --path models\n```\n\n\n6. Open a new terminal and now you can curl results from the server, for example:\n\n```\n$ curl -XPOST localhost:5000/parse -d '{\"q\":\"我发烧了该吃什么药？\", \"project\": \"rasa_nlu_test\", \"model\": \"model_20170921-170911\"}' | python -mjson.tool\n  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n100   652    0   552  100   100    157     28  0:00:03  0:00:03 --:--:--   157\n{\n    \"entities\": [\n        {\n            \"end\": 3,\n            \"entity\": \"disease\",\n            \"extractor\": \"ner_mitie\",\n            \"start\": 1,\n            \"value\": \"发烧\"\n        }\n    ],\n    \"intent\": {\n        \"confidence\": 0.5397186422631861,\n        \"name\": \"medical\"\n    },\n    \"intent_ranking\": [\n        {\n            \"confidence\": 0.5397186422631861,\n            \"name\": \"medical\"\n        },\n        {\n            \"confidence\": 0.16206323981749196,\n            \"name\": \"restaurant_search\"\n        },\n        {\n            \"confidence\": 0.1212448457737397,\n            \"name\": \"affirm\"\n        },\n        {\n            \"confidence\": 0.10333600028547868,\n            \"name\": \"goodbye\"\n        },\n        {\n            \"confidence\": 0.07363727186010374,\n            \"name\": \"greet\"\n        }\n    ],\n    \"text\": \"我发烧了该吃什么药？\"\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcrownpku%2Frasa_nlu_chi","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcrownpku%2Frasa_nlu_chi","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcrownpku%2Frasa_nlu_chi/lists"}