{"id":13694817,"url":"https://github.com/hankcs/TextRank","last_synced_at":"2025-05-03T04:30:58.429Z","repository":{"id":16964568,"uuid":"19727041","full_name":"hankcs/TextRank","owner":"hankcs","description":"TextRank算法提取关键词的Java实现","archived":false,"fork":false,"pushed_at":"2015-05-03T06:59:33.000Z","size":4942,"stargazers_count":201,"open_issues_count":0,"forks_count":94,"subscribers_count":19,"default_branch":"master","last_synced_at":"2025-04-02T03:34:57.901Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hankcs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2014-05-13T05:31:59.000Z","updated_at":"2024-09-20T18:53:46.000Z","dependencies_parsed_at":"2022-08-04T15:30:20.460Z","dependency_job_id":null,"html_url":"https://github.com/hankcs/TextRank","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hankcs%2FTextRank","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hankcs%2FTextRank/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hankcs%2FTextRank/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hankcs%2FTextRank/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hankcs","download_url":"https://codeload.github.com/hankcs/TextRank/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252144506,"owners_count":21701426,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T17:01:43.734Z","updated_at":"2025-05-03T04:30:56.728Z","avatar_url":"https://github.com/hankcs.png","language":"Java","funding_links":[],"categories":["Java","人工智能"],"sub_categories":["自然语言处理"],"readme":"TextRank\n========\n\nTextRank算法提取关键词与自动摘要的Java实现\n\n## 注意\n\n**TextRank已经集成到[HanLP](https://github.com/hankcs/HanLP)中，本项目不再维护。**\n\nTextRankKeyword提取关键词\n--\n\n - 调用方法\n \n```java\npublic static void main(String[] args)\n{\n    String content = \"程序员(英文Programmer)是从事程序开发、维护的专业人员。一般将程序员分为程序设计人员和程序编码人员，但两者的界限并不非常清楚，特别是在中国。软件从业人员分为初级程序员、高级程序员、系统分析员和项目经理四大类。\";\n    System.out.println(new TextRankKeyword().getKeyword(\"\", content));\n}\n```\n - 算法详解\n TextRank是在Google的PageRank算法启发下，针对文本里的句子设计的权重算法，目标是自动摘要。\n 详见[《TextRank算法提取关键词的Java实现》][1]\n - 关于分词\n  分词不是TextRank关注的重点，项目中的警告是分词库发出，不影响功能。\n\nTextRankSummary自动摘要\n--\n - 调用方法\n \n```java\n public static void main(String[] args)\n {\n    String document = \"算法可大致分为基本算法、数据结构的算法、数论算法、计算几何的算法、图的算法、动态规划以及数值分析、加密算法、排序算法、检索算法、随机化算法、并行算法、厄米变形模型、随机森林算法。\\n\" +\n                \"算法可以宽泛的分为三类，\\n\" +\n                \"一，有限的确定性算法，这类算法在有限的一段时间内终止。他们可能要花很长时间来执行指定的任务，但仍将在一定的时间内终止。这类算法得出的结果常取决于输入值。\\n\" +\n                \"二，有限的非确定算法，这类算法在有限的时间内终止。然而，对于一个（或一些）给定的数值，算法的结果并不是唯一的或确定的。\\n\" +\n                \"三，无限的算法，是那些由于没有定义终止定义条件，或定义的条件无法由输入的数据满足而不终止运行的算法。通常，无限算法的产生是由于未能确定的定义终止条件。\";\n    System.out.println(TextRankSummary.getTopSentenceList(document, 3));\n }\n```\n - 算法详解\n 通过句子的相关程度（BM25相关度）决定票的权重，迭代投票得出最终权重。\n 详见[《TextRank算法自动摘要的Java实现》][2]\n\nTODO\n--\n- 自然语言处理任重道远，本项目只是对TextRank的一份简明实现，效果和性能请自行评估。\n- 我写了[一系列入门笔记][3]，欢迎NLP领域的朋友[前来交流、指导我的学习][3]。\n- 事实上，我正在开发一个完备的汉语处理包，目前能够提供分词、词性标注、命名实体识别、关键字提取、短语提取、自动摘要、自动推荐等功能，未来可能开源并逐步实现依存关系、句法树等功能，敬请期待。\n\n  [1]: http://www.hankcs.com/nlp/textrank%E7%AE%97%E6%B3%95%E6%8F%90%E5%8F%96%E5%85%B3%E9%94%AE%E8%AF%8D%E7%9A%84java%E5%AE%9E%E7%8E%B0.html\n [2]: \nhttp://www.hankcs.com/nlp/textrank-algorithm-java-implementation-of-automatic-abstract.html\n [3]: \nhttp://www.hankcs.com/category/nlp/","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhankcs%2FTextRank","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhankcs%2FTextRank","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhankcs%2FTextRank/lists"}