{"id":13672484,"url":"https://github.com/observerss/ngender","last_synced_at":"2025-05-16T05:07:43.361Z","repository":{"id":32422472,"uuid":"35999827","full_name":"observerss/ngender","owner":"observerss","description":"根据姓名来判断性别","archived":false,"fork":false,"pushed_at":"2020-02-27T01:47:02.000Z","size":200,"stargazers_count":611,"open_issues_count":8,"forks_count":164,"subscribers_count":14,"default_branch":"master","last_synced_at":"2025-05-10T08:04:36.857Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/observerss.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-05-21T08:09:34.000Z","updated_at":"2025-04-11T09:53:55.000Z","dependencies_parsed_at":"2022-06-30T12:53:10.617Z","dependency_job_id":null,"html_url":"https://github.com/observerss/ngender","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/observerss%2Fngender","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/observerss%2Fngender/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/observerss%2Fngender/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/observerss%2Fngender/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/observerss","download_url":"https://codeload.github.com/observerss/ngender/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254471059,"owners_count":22076585,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T09:01:36.979Z","updated_at":"2025-05-16T05:07:38.350Z","avatar_url":"https://github.com/observerss.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# NGender\n\n根据中文姓名猜测其性别\n\n- 不到20行纯Python代码(核心部分)\n- 无任何依赖库\n- 兼容python3, python2, pypy\n- 82%的准确率\n- 可用于猜测性别\n- 也可用于判断名字的男性化/女性化程度\n\n## 使用\n\n\u003e pip install ngender\n\n或者(OSX)\n\n\u003e brew install https://raw.githubusercontent.com/observerss/homebrew/61b3623967dc9507958dfb517e7f746baa96dcf1/Library/Formula/ngender.rb\n\n然后在命令行中\n\n```bash\n$ ng 赵本山 宋丹丹\nname: 赵本山 =\u003e gender: male, probability: 0.9836229687547046\nname: 宋丹丹 =\u003e gender: female, probability: 0.9759486128949907\n```\n\n当然也可以在Python程序中用\n\n```py\n\u003e\u003e\u003e import ngender\n\u003e\u003e\u003e ngender.guess('赵本山')\n('male', 0.9836229687547046)\n\n\u003e\u003e\u003e ngender.guess('宋丹丹')\n('female', 0.9759486128949907)\n\n\u003e\u003e\u003e %timeit guess('宋丹丹')\n100000 loops, best of 3: 4.01 µs per loop\n```\n\n## 原理\n\n### 数学\n\n贝叶斯公式: ```P(Y|X) = P(X|Y) * P(Y) / P(X)```\n\n当X条件独立时, ```P(X|Y) = P(X1|Y) * P(X2|Y) * ...```\n\n应用到猜名字上\n\n```\nP(gender=男|name=本山) \n= P(name=本山|gender=男) * P(gender=男) / P(name=本山)\n= P(name has 本|gender=男) * P(name has 山|gender=男) * P(gender=男) / P(name=本山)\n```\n\n### 计算\n\n0. 文件`charfreq.csv`是怎么来的?\n \n\t曾经有个东西叫开房记录.avi(雾)，里面有名字和性别, 2000w条, 统计一下得出\n\n0. 怎么算 `P(name has 本|gender=男)`?\n \n\t“本”在男性名字中出现的次数 / 男性字出现的总次数\n\t\n0. 怎么算 `P(gender=男)`?\n \n\t男性名出现的次数 / 总次数\n\n0. 怎么算 `P(name=本山)`?\n \n\t不用算, 在算概率的时候会互相约去\n\t\n\n\n## 坑\n\n```py\n\u003e\u003e\u003e ngender.guess('李胜男')\n('male', 0.851334658742)\n```\n\n虽然两个字都很偏男性，但是结合起来就是女性名\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fobserverss%2Fngender","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fobserverss%2Fngender","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fobserverss%2Fngender/lists"}