{"id":27231236,"url":"https://github.com/cryptum169/rake_for_chinese","last_synced_at":"2025-04-10T13:44:10.272Z","repository":{"id":52942956,"uuid":"134552159","full_name":"Cryptum169/Rake_For_Chinese","owner":"Cryptum169","description":"Implementation of RAKE in Chinese","archived":false,"fork":false,"pushed_at":"2021-04-12T22:27:09.000Z","size":112,"stargazers_count":22,"open_issues_count":0,"forks_count":10,"subscribers_count":2,"default_branch":"master","last_synced_at":"2023-04-28T13:36:51.973Z","etag":null,"topics":["chinese","keyword-extraction","nlp","rake"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Cryptum169.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-05-23T10:18:08.000Z","updated_at":"2023-01-12T01:48:17.000Z","dependencies_parsed_at":"2022-08-26T20:56:11.146Z","dependency_job_id":null,"html_url":"https://github.com/Cryptum169/Rake_For_Chinese","commit_stats":null,"previous_names":[],"tags_count":null,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Cryptum169%2FRake_For_Chinese","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Cryptum169%2FRake_For_Chinese/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Cryptum169%2FRake_For_Chinese/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Cryptum169%2FRake_For_Chinese/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Cryptum169","download_url":"https://codeload.github.com/Cryptum169/Rake_For_Chinese/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248226448,"owners_count":21068204,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chinese","keyword-extraction","nlp","rake"],"created_at":"2025-04-10T13:44:09.557Z","updated_at":"2025-04-10T13:44:10.248Z","avatar_url":"https://github.com/Cryptum169.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Rake 算法的中文应用\n这是一个对 Rose, S., Engel, D., Cramer, N., \u0026 Cowley, W. (2010). Automatic Keyword Extraction from Individual Documents. In M. W. Berry \u0026 J. Kogan (Eds.), Text Mining: Theory and Applications: John Wiley \u0026 Sons. 中提及的 Rapid Automatic Keyword Extraction 在中文上的应用, MIT License\n\n# 安装说明\n代码写于 Python 版本 3.6.5; 此应用使用了中文分词工具结巴 https://github.com/fxsjy/jieba 可用 `pip install jieba` 安装结巴。\n\n# 主要功能\n使用 RAKE 算法从中文段落中选取关键词\n\n# Rake_For_Chinese\nA Python implementation of the Rapid Automatic Keyword Extraction (RAKE) algorithm as described in: Rose, S., Engel, D., Cramer, N., \u0026 Cowley, W. (2010). Automatic Keyword Extraction from Individual Documents. In M. W. Berry \u0026 J. Kogan (Eds.), Text Mining: Theory and Applications: John Wiley \u0026 Sons.\nCodebase in MIT License\n\n# Requirement\nThis package requires the Chinese text segmentation package Jieba. Run install through `requirements.txt`\n\n## Functional Overview\nRAKE Keyword Extraction is an algorithm that is by design corpus-independent and language-independent. In a nutshell, it calculates scores for words based on its independent occurrence and its occurrence in phrases, and then combine all scores for every word inside a phrase to get the score for the phrase, with some additional criterias to eliminate boundary conditions.\n\nThis method however, cannot be directly applied to Chinese since first, there are no obvious word deliminators and second when we come to parsing phrases, there's more variety to it than that in English and other language with similar syntaxes.\n\nGeneration of \"Word\" and \"Phrases\" as needed by RAKE algorithm sought help from another Chinese Text Segmentation package, Jieba. Jieba is used to cut raw texts into segments of word. Then the list is filter with PoS property, stopword and conjunction word list and punctuation list to parse phrases. \n\n## Example Code\nSee `example.py`\n\n## Sample Output\nTake news article at this link for example: http://www.pingwest.com/sony-expo-2018-at-chengdu/\n\nSample output in the following, with the output of this implementation in the **last line**\n\n```\nTextRank4ZH-关键词：\n索尼, 索粉, 粉丝, 上, 业务, 破产, sony, 人, 会, 产品\njieba-关键词：\n平井, 一夫, 粉丝, 魅力, 产品, 中国, 业务, 财年, 偶像, 成都\nKeyExtract-关键词：\n索尼, 索粉, 平井, 一夫, 魅力, 一个, 财年, 粉丝, 赏, 亿日元\nTextRank4ZH-关键词：\n平井一夫, 索尼魅力, 索尼中国\njieba-关键短语：(N/A)\nKeyExtract-关键短语：（N/A）\nRake4ZH-关键短语：\n名字——索尼魅力赏, 游戏作品软件销量, 高质量消费电子品产, 索尼游戏业务正式回, 放心——,  游戏销量超,  全球销量超,  款精品 , 款游戏作品, 伙伴合作补完\nRake 中文关键短语/词\n平井一夫, 营业利润, 游戏, 正式, 明星, 总裁, 改革, 高桥洋, 中国, 利润\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcryptum169%2Frake_for_chinese","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcryptum169%2Frake_for_chinese","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcryptum169%2Frake_for_chinese/lists"}