{"id":13462589,"url":"https://github.com/wainshine/Chinese-Names-Corpus","last_synced_at":"2025-03-25T02:30:50.309Z","repository":{"id":38185480,"uuid":"75951828","full_name":"wainshine/Chinese-Names-Corpus","owner":"wainshine","description":"中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。","archived":false,"fork":false,"pushed_at":"2024-03-27T04:58:40.000Z","size":36625,"stargazers_count":4054,"open_issues_count":8,"forks_count":1003,"subscribers_count":105,"default_branch":"master","last_synced_at":"2025-01-31T15:24:53.236Z","etag":null,"topics":["corpus","dataset","dict","names","ner"],"latest_commit_sha":null,"homepage":"https://open.namemoe.com/","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wainshine.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-12-08T15:47:52.000Z","updated_at":"2025-01-31T04:58:10.000Z","dependencies_parsed_at":"2024-07-31T12:17:20.849Z","dependency_job_id":null,"html_url":"https://github.com/wainshine/Chinese-Names-Corpus","commit_stats":null,"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wainshine%2FChinese-Names-Corpus","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wainshine%2FChinese-Names-Corpus/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wainshine%2FChinese-Names-Corpus/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wainshine%2FChinese-Names-Corpus/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wainshine","download_url":"https://codeload.github.com/wainshine/Chinese-Names-Corpus/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245385276,"owners_count":20606642,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["corpus","dataset","dict","names","ner"],"created_at":"2024-07-31T12:00:52.963Z","updated_at":"2025-03-25T02:30:50.300Z","avatar_url":"https://github.com/wainshine.png","language":null,"funding_links":[],"categories":["Uncategorized","Others","Others (1002)","Table of Contents","Corpus 中文语料"],"sub_categories":["Uncategorized","Multi-Modal Representation \u0026 Retrieval 多模态表征与检索"],"readme":"# 中文人名语料库（Chinese-Names-Corpus）\n\n\u003cstrong\u003e关于萌名（NameMoe）\u003c/strong\u003e\n\n萌名是一个基于大数据和自然语言处理技术的新取名产品。\n\n通过分词工具对海量文本进行分词和词频统计。数据清洗后，得到千万级的人名词典。再对其进行性别、年龄、拼音、情感、人名指数等标记，最终形成5600万+的中文人名图谱。\n\n本子项目可用于中文分词、人名识别等场景。\n\n---\n\nPS1：维护此项目，除个人兴趣外，主要是在此过程中，可通过任务驱动来不断学习和实践NLP、KG以及AI等相关前沿技术。\n\nPS2：正在找工作，求内部推荐～ 移动医疗/SaaS后台/人工智能方向的 高级产品经理一枚。\n\nPS3：请勿提交涉政issue，谢谢。\n\nPS4：如将本项目转存到国内的某平台，请设置成0积分下载，并保留GitHub链接。\n\n---\n\n\u003cstrong\u003e中文常见人名（Chinese_Names_Corpus）\u003c/strong\u003e\n\n数据大小：120万。\n\n语料来源：从亿级人名语料中提取。\n\n数据清洗：已清洗，但仍存有少量badcase。\n\n\u003cstrong\u003e新增人名生成器。\u003c/strong\u003e\n\n---\n\n\u003cstrong\u003e中文古代人名（Ancient_Names_Corpus）\u003c/strong\u003e\n\n数据大小：25万。\n\n语料来源：多个人名词典汇总。\n\n数据清洗：已清洗。\n\n---\n\n\u003cstrong\u003e中文姓氏（Chinese_Family_Name）\u003c/strong\u003e\n\n数据大小：1千。\n\n语料来源：从亿级人名语料中提取。\n\n数据清洗：已清洗。\n\n---\n\n\u003cstrong\u003e中文称呼（Chinese_Relationship）\u003c/strong\u003e\n\n数据大小：5千，称呼词根；18万，中文称呼。\n\n语料来源：多个人名词典汇总。\n\n数据清洗：已清洗，但仍存有大量badcase。\n\n---\n\n# 英文人名语料库（English-Names-Corpus）\n\u003cstrong\u003e翻译人名（English_Cn_Name_Corpus）\u003c/strong\u003e\n\n数据大小：48万。\n\n语料来源：多个人名词典汇总。\n\n数据清洗：已清洗，但仍存有少量badcase，以地名居多。\n\n本语料的人名识别由网友 “[ltccss](https://github.com/ltccss)” 友情提供。\n\n---\n\n# 日文人名语料库（Japanese_Names_Corpus）\n\u003cstrong\u003e日文人名（Japanese_Names_Corpus）\u003c/strong\u003e\n\n数据大小：18万。\n\n数据来源：从维基百科中提取。\n\n数据清洗：已清洗，但仍存有少量badcase。\n\n数据清洗过程详见：“[日本人名数据清洗分享](https://github.com/wainshine/Chinese-Names-Corpus/issues/4)”。\n\n---\n\n# 中文词典语料库（Chinese_Dict_Corpus）\n\u003cstrong\u003e成语词典（ChengYu_Corpus）\u003c/strong\u003e\n\n数据大小：5万。\n\n语料来源：多个成语词典汇总。\n\n数据清洗：已清洗。\n\n---\n\n## Stargazers over time\n\n[![Stargazers over time](https://starchart.cc/wainshine/Chinese-Names-Corpus.svg)](https://starchart.cc/wainshine/Chinese-Names-Corpus)\n\n---\n\n\u003cstrong\u003e更新时间：\u003c/strong\u003e\n\n更早的提交，不记得时间了。\n\n删除了1000余非人名。 -2017.08.08\n\n删除了5000余非人名。 -2017.11.25\n\n新增了18万日文人名。 -2017.12.17\n\n删除了1500余非人名（主要是日文地名）。 -2017.12.30\n\n删除了约3万余非人名、或低频人名。 -2018.11.04\n\n删除了2600余非人名、或低频人名。 -2019.04.15\n\n删除了约1万余非人名、或低频人名。 -2019.07.27\n\n将文件移动到文件夹。 -2019.10.21\n\n新增人名生成器。 -2020.01.29\n\n删除了约6万余非人名、或低频人名。 -2020.12.13\n\n更新人名生成器。 -2021.11.22\n\n删除了约700余非人名、或低频人名。 -2022.11.30\n\n---\n\n@萌名NameMoe 整理\n\n2024.03.27\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwainshine%2FChinese-Names-Corpus","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwainshine%2FChinese-Names-Corpus","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwainshine%2FChinese-Names-Corpus/lists"}