{"id":13595411,"url":"https://github.com/hankcs/pyhanlp","last_synced_at":"2025-05-12T15:17:56.885Z","repository":{"id":40414532,"uuid":"125806243","full_name":"hankcs/pyhanlp","owner":"hankcs","description":"中文分词","archived":false,"fork":false,"pushed_at":"2025-01-16T02:45:02.000Z","size":287,"stargazers_count":3170,"open_issues_count":13,"forks_count":808,"subscribers_count":84,"default_branch":"master","last_synced_at":"2025-04-23T17:12:02.961Z","etag":null,"topics":["chinese-word-segmentation","dependency-parser","hanlp","named-entity-recognition","natural-language-processing","part-of-speech-tagger"],"latest_commit_sha":null,"homepage":"https://hanlp.hankcs.com/","language":"Python","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hankcs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-03-19T05:34:36.000Z","updated_at":"2025-04-18T08:38:28.000Z","dependencies_parsed_at":"2024-01-16T22:19:56.632Z","dependency_job_id":"c442da92-0d62-4f73-af11-56f45e6b2042","html_url":"https://github.com/hankcs/pyhanlp","commit_stats":{"total_commits":195,"total_committers":12,"mean_commits":16.25,"dds":0.1282051282051282,"last_synced_commit":"2620340198a0c78fae1965ba218e18a9bf75942b"},"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hankcs%2Fpyhanlp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hankcs%2Fpyhanlp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hankcs%2Fpyhanlp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hankcs%2Fpyhanlp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hankcs","download_url":"https://codeload.github.com/hankcs/pyhanlp/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250477810,"owners_count":21437049,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chinese-word-segmentation","dependency-parser","hanlp","named-entity-recognition","natural-language-processing","part-of-speech-tagger"],"created_at":"2024-08-01T16:01:49.536Z","updated_at":"2025-04-23T17:12:08.329Z","avatar_url":"https://github.com/hankcs.png","language":"Python","funding_links":[],"categories":["其他_NLP自然语言处理","Uncategorized","Python"],"sub_categories":["其他_文本生成、文本对话","Uncategorized"],"readme":"# pyhanlp: Python interfaces for HanLP1.x\n\n[![单元测试](https://github.com/hankcs/pyhanlp/actions/workflows/unit-tests.yml/badge.svg?branch=master)](https://github.com/hankcs/pyhanlp/actions/workflows/unit-tests.yml?query=branch%3Amaster) ![pypi](https://img.shields.io/pypi/v/pyhanlp) [![Downloads](https://pepy.tech/badge/pyhanlp)](https://pepy.tech/project/pyhanlp) [![GitHub license](https://img.shields.io/github/license/hankcs/pyhanlp)](https://github.com/hankcs/pyhanlp/blob/master/LICENSE) [![Run Jupyter](https://img.shields.io/badge/Run-Jupyter-orange?style=flat\u0026logo=Jupyter)](https://mybinder.org/v2/gh/hankcs/pyhanlp.git/master?filepath=tests%2Fbook%2Findex.ipynb) [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/hankcs/pyhanlp.git/master?filepath=tests%2Fbook%2Findex.ipynb)\n\n[HanLP1.x](https://github.com/hankcs/HanLP/tree/1.x)的Python接口，支持自动下载与升级[HanLP1.x](https://github.com/hankcs/HanLP/tree/1.x)，兼容Python\u003e=3.6。内部算法经过工业界和学术界考验，配套书籍[《自然语言处理入门》](http://nlp.hankcs.com/book.php)已经出版，欢迎查阅[随书代码](https://github.com/hankcs/pyhanlp/tree/master/tests/book)或点击[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/hankcs/pyhanlp.git/master?filepath=tests%2Fbook%2Findex.ipynb)在线运行。基于深度学习的[HanLP2.x](https://github.com/hankcs/HanLP/tree/doc-zh)已于2020年初发布，次世代最先进的多语种NLP技术，与1.x相辅相成，平行发展。\n\n## 安装\n\n**懒人**请点击[![Run Jupyter](https://img.shields.io/badge/Run-Jupyter-orange?style=flat\u0026logo=Jupyter)](https://mybinder.org/v2/gh/hankcs/pyhanlp.git/master?filepath=tests%2Fbook%2Findex.ipynb)；**小白**可直接使用[傻瓜安装包](https://nlp.hankcs.com/download.php?file=exe)；**工程师**请先安装[conda](https://docs.conda.io/en/latest/miniconda.html)，然后执行：\n\n```bash\nconda install -c conda-forge openjdk python=3.8 -y\npip install pyhanlp\n```\n\n使用命令`hanlp`来验证安装，如因网络等原因自动安装失败，可参考[手动配置](https://github.com/hankcs/pyhanlp/wiki/%E6%89%8B%E5%8A%A8%E9%85%8D%E7%BD%AE)或[Windows指南](https://github.com/hankcs/pyhanlp/wiki/Windows)。\n\n- pyhanlp每次发布都通过了Linux、macOS和Windows上Python3.6至3.13的[单元测试](https://github.com/hankcs/pyhanlp/actions/workflows/unit-tests.yml?query=branch%3Amaster)，不存在安装问题。\n\n\n## 命令行\n\n### 中文分词\n\n使用命令`hanlp segment`进入交互分词模式，输入一个句子并回车，[HanLP1.x](https://github.com/hankcs/HanLP/tree/1.x)会输出分词结果：\n\n```python\n$ hanlp segment\n商品和服务\n商品/n 和/cc 服务/vn\n当下雨天地面积水分外严重\n当/p 下雨天/n 地面/n 积水/n 分外/d 严重/a\n龚学平等领导说,邓颖超生前杜绝超生\n龚学平/nr 等/udeng 领导/n 说/v ,/w 邓颖超/nr 生前/t 杜绝/v 超生/vi\n```\n\n还可以重定向输入输出到文件等：\n\n```python\n$ hanlp segment \u003c\u003c\u003c '欢迎新老师生前来就餐'               \n欢迎/v 新/a 老/a 师生/n 前来/vi 就餐/vi\n```\n\n### 依存句法分析\n\n命令为`hanlp parse`，同样支持交互模式和重定向：\n\n```python\n$ hanlp parse \u003c\u003c\u003c '徐先生还具体帮助他确定了把画雄鹰、松鼠和麻雀作为主攻目标。'         \n1\t徐先生\t徐先生\tnh\tnr\t_\t4\t主谓关系\t_\t_\n2\t还\t还\td\td\t_\t4\t状中结构\t_\t_\n3\t具体\t具体\ta\ta\t_\t4\t状中结构\t_\t_\n4\t帮助\t帮助\tv\tv\t_\t0\t核心关系\t_\t_\n5\t他\t他\tr\trr\t_\t4\t兼语\t_\t_\n6\t确定\t确定\tv\tv\t_\t4\t动宾关系\t_\t_\n7\t了\t了\tu\tule\t_\t6\t右附加关系\t_\t_\n8\t把\t把\tp\tpba\t_\t15\t状中结构\t_\t_\n9\t画\t画\tv\tv\t_\t8\t介宾关系\t_\t_\n10\t雄鹰\t雄鹰\tn\tn\t_\t9\t动宾关系\t_\t_\n11\t、\t、\twp\tw\t_\t12\t标点符号\t_\t_\n12\t松鼠\t松鼠\tn\tn\t_\t10\t并列关系\t_\t_\n13\t和\t和\tc\tcc\t_\t14\t左附加关系\t_\t_\n14\t麻雀\t麻雀\tn\tn\t_\t10\t并列关系\t_\t_\n15\t作为\t作为\tp\tp\t_\t6\t动宾关系\t_\t_\n16\t主攻\t主攻\tv\tvn\t_\t17\t定中关系\t_\t_\n17\t目标\t目标\tn\tn\t_\t15\t动宾关系\t_\t_\n18\t。\t。\twp\tw\t_\t4\t标点符号\t_\t_\n```\n\n### 服务器\n\n通过`hanlp serve`来启动内置的http服务器，默认本地访问地址为：http://localhost:8765 ；也可以访问官网演示页面：http://hanlp.hankcs.com/ 。\n\n### 升级\n\n通过`hanlp update`命令来将[HanLP1.x](https://github.com/hankcs/HanLP/tree/1.x)升级到最新版。该命令会获取[HanLP主项目最新版本](https://github.com/hankcs/HanLP/releases)并自动下载安装。\n\n欢迎通过`hanlp --help`查看最新帮助手册。\n\n## API\n\n通过工具类[`HanLP`](https://github.com/hankcs/HanLP/blob/1.x/src/main/java/com/hankcs/hanlp/HanLP.java#L55)调用常用接口：\n\n```python\nfrom pyhanlp import *\n\nprint(HanLP.segment('你好，欢迎在Python中调用HanLP的API'))\nfor term in HanLP.segment('下雨天地面积水'):\n    print('{}\\t{}'.format(term.word, term.nature)) # 获取单词与词性\ntestCases = [\n    \"商品和服务\",\n    \"结婚的和尚未结婚的确实在干扰分词啊\",\n    \"买水果然后来世博园最后去世博会\",\n    \"中国的首都是北京\",\n    \"欢迎新老师生前来就餐\",\n    \"工信处女干事每月经过下属科室都要亲口交代24口交换机等技术性器件的安装工作\",\n    \"随着页游兴起到现在的页游繁盛，依赖于存档进行逻辑判断的设计减少了，但这块也不能完全忽略掉。\"]\nfor sentence in testCases: print(HanLP.segment(sentence))\n# 关键词提取\ndocument = \"水利部水资源司司长陈明忠9月29日在国务院新闻办举行的新闻发布会上透露，\" \\\n           \"根据刚刚完成了水资源管理制度的考核，有部分省接近了红线的指标，\" \\\n           \"有部分省超过红线的指标。对一些超过红线的地方，陈明忠表示，对一些取用水项目进行区域的限批，\" \\\n           \"严格地进行水资源论证和取水许可的批准。\"\nprint(HanLP.extractKeyword(document, 2))\n# 自动摘要\nprint(HanLP.extractSummary(document, 3))\n# 依存句法分析\nprint(HanLP.parseDependency(\"徐先生还具体帮助他确定了把画雄鹰、松鼠和麻雀作为主攻目标。\"))\n```\n\n### 更多功能\n\n更多功能，包括但不限于：\n\n- 自定义词典\n- 极速词典分词\n- 索引分词\n- CRF分词\n- 感知机词法分析\n- 臺灣正體、香港繁體\n- 关键词提取、自动摘要\n- 文本分类、情感分析\n\n请阅读[HanLP主项目文档](https://github.com/hankcs/HanLP/blob/1.x/README.md)和[demos目录](https://github.com/hankcs/pyhanlp/tree/master/tests/demos)以了解更多。调用更底层的API需要参考Java语法用JClass引入更深的类路径。以感知机词法分析器为例，这个类位于包名[`com.hankcs.hanlp.model.perceptron.PerceptronLexicalAnalyzer`](https://github.com/hankcs/HanLP/blob/1.x/src/main/java/com/hankcs/hanlp/model/perceptron/PerceptronLexicalAnalyzer.java)下，所以先用`JClass`得到类，然后就可以调用了：\n\n```\nPerceptronLexicalAnalyzer = JClass('com.hankcs.hanlp.model.perceptron.PerceptronLexicalAnalyzer')\nanalyzer = PerceptronLexicalAnalyzer()\nprint(analyzer.analyze(\"上海华安工业（集团）公司董事长谭旭光和秘书胡花蕊来到美国纽约现代艺术博物馆参观\"))\n```\n\n输出：\n\n```\n[上海/ns 华安/nz 工业/n （/w 集团/n ）/w 公司/n]/nt 董事长/n 谭旭光/nr 和/c 秘书/n 胡花蕊/nr 来到/v [美国/ns 纽约/ns 现代/t 艺术/n 博物馆/n]/ns 参观/v\n```\n\n如果你需要多线程安全性，可使用`SafeJClass`；如果你需要延迟加载，可使用`LazyLoadingJClass`。如果你经常使用某个类，欢迎将其写入`pyhanlp/__init__.py`中并提交pull request，谢谢！\n\n## 与其他项目共享data\n\n[HanLP1.x](https://github.com/hankcs/HanLP/tree/1.x)具备高度可自定义的特点，所有模型和词典都可以自由替换。如果你希望与别的项目共享同一套data，只需将该项目的配置文件`hanlp.properties`拷贝到pyhanlp的安装目录下即可。本机安装目录可以通过`hanlp --version`获取。\n\n同时，还可以通过`--config`临时加载另一个配置文件：\n\n```\nhanlp segment --config path/to/another/hanlp.properties\n```\n\n## 测试\n\n```\ngit clone https://github.com/hankcs/pyhanlp.git\ncd pyhanlp\npip install -e .\npython tests/test_hanlp.py\n```\n\n## 反馈\n\n任何bug，请前往[HanLP issue区](https://github.com/hankcs/HanLP/issues)。提问请上[论坛](https://bbs.hankcs.com/)反馈，谢谢。\n\n## [《自然语言处理入门》](http://nlp.hankcs.com/book.php)\n\n自然语言处理是一门博大精深的学科，掌握理论才能发挥出工具的全部性能。新手可考虑这本入门书：\n\n![img](http://file.hankcs.com/img/nlp-book-squre.jpg)\n\n一本配套HanLP的NLP入门书，基础理论与生产代码并重，Python与Java双实现。从基本概念出发，逐步介绍中文分词、词性标注、命名实体识别、信息抽取、文本聚类、文本分类、句法分析这几个热门问题的算法原理与工程实现。书中通过对多种算法的讲解，比较了它们的优缺点和适用场景，同时详细演示生产级成熟代码，助你真正将自然语言处理应用在生产环境中。\n\n[《自然语言处理入门》](http://nlp.hankcs.com/book.php)由南方科技大学数学系创系主任夏志宏、微软亚洲研究院副院长周明、字节跳动人工智能实验室总监李航、华为诺亚方舟实验室语音语义首席科学家刘群、小米人工智能实验室主任兼NLP首席科学家王斌、中国科学院自动化研究所研究员宗成庆、清华大学副教授刘知远、北京理工大学副教授张华平和52nlp作序推荐。感谢各位前辈老师，希望这个项目和这本书能成为大家工程和学习上的“蝴蝶效应”，帮助大家在NLP之路上蜕变成蝶。\n\n## 授权协议\n\nApache License 2.0\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhankcs%2Fpyhanlp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhankcs%2Fpyhanlp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhankcs%2Fpyhanlp/lists"}