{"id":21889004,"url":"https://github.com/ellisbrown/name2gender","last_synced_at":"2025-07-22T04:31:53.926Z","repository":{"id":72576029,"uuid":"108458181","full_name":"ellisbrown/name2gender","owner":"ellisbrown","description":"Extrapolate gender from first names using Naïve-Bayes and PyTorch Char-RNN","archived":false,"fork":false,"pushed_at":"2017-12-27T04:51:09.000Z","size":19313,"stargazers_count":25,"open_issues_count":0,"forks_count":13,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-15T10:58:06.183Z","etag":null,"topics":["char-rnn","deep-learning","gender-classification","machine-learning","naive-bayes-classifier","rnn"],"latest_commit_sha":null,"homepage":"https://medium.com/@ellisbrown/name2gender-introduction-626d89378fb0","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ellisbrown.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2017-10-26T19:46:54.000Z","updated_at":"2024-12-28T19:51:29.000Z","dependencies_parsed_at":"2023-05-24T14:15:29.287Z","dependency_job_id":null,"html_url":"https://github.com/ellisbrown/name2gender","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ellisbrown/name2gender","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ellisbrown%2Fname2gender","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ellisbrown%2Fname2gender/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ellisbrown%2Fname2gender/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ellisbrown%2Fname2gender/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ellisbrown","download_url":"https://codeload.github.com/ellisbrown/name2gender/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ellisbrown%2Fname2gender/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266427816,"owners_count":23926883,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-22T02:00:09.085Z","response_time":66,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["char-rnn","deep-learning","gender-classification","machine-learning","naive-bayes-classifier","rnn"],"created_at":"2024-11-28T11:18:33.310Z","updated_at":"2025-07-22T04:31:53.745Z","avatar_url":"https://github.com/ellisbrown.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Name2Gender\n\nUsing character sequences in first names to predict gender. This is a quick exploration into the interesting problem; see my Medium post where I elaborate on why it is interesting https://medium.com/@ellisbrown/name2gender-introduction-626d89378fb0.\n\nI have implemented a Naïve-Bayes approach and an Char-RNN approach, which are contained in their respective subdirectories.\n\n### Table of Contents\n- \u003ca href='https://goo.gl/1dxe5A'\u003eMedium post\u003c/a\u003e\n- \u003ca href='#naïve-bayes-naive_bayes'\u003eNaïve Bayes\u003c/a\u003e\n- \u003ca href='#char-rnn-rnn'\u003eChar-Rnn\u003c/a\u003e\n- \u003ca href='#dataset-data'\u003eDataset\u003c/a\u003e\n- \u003ca href='#acknowledgement'\u003eAcknowlegement\u003c/a\u003e\n\n\n## Naïve-Bayes [/naive_bayes](https://github.com/ellisbrown/name2gender/tree/master/naive_bayes)\nIn this approach, I defined features of first names (last two letters, count of vowels, etc.) to use to learn the genders. I explain this in more detail [here in my blog post](https://medium.com/@ellisbrown/name2gender-introduction-626d89378fb0#9dfc) and in the [/naive_bayes subdirectory](https://github.com/ellisbrown/name2gender/blob/master/naive_bayes).\n\n## Char-RNN [/rnn](https://github.com/ellisbrown/name2gender/tree/master/rnn)\nIn this second approach, I feed characters in a name one by one through a character level recurrent neural network built in PyTorch in the hopes of learning the latent space of all character sequences that denote gender without having to define them a priori. I explain this in more detail [here in my blog post](https://medium.com/@ellisbrown/name2gender-introduction-626d89378fb0#019f) in the [/rnn subdirectory](https://github.com/ellisbrown/name2gender/blob/master/rnn).\n\n## Dataset [/data](https://github.com/ellisbrown/name2gender/tree/master/data)\nI have aggregated multiple smaller datasets representing various cultures into a large dataset (~135k instances) of gender-labeled first names. See [data/**dataset.ipynb**](https://github.com/ellisbrown/name2gender/blob/master/data/dataset.ipynb) for further information on how I pulled it together. Note: I did not spend a ton of time going through and pruning this dataset, so it is probably not amazing or particularly clean (I would greatly appreciate any PR’s if anyone cares or has the time!).\n\n\n### Acknowledgement\nBelow are a bunch of links I found useful:\n* http://blog.ayoungprogrammer.com/2016/04/determining-gender-of-name-with-80.html/\n* http://www.nltk.org/book/ch06.html\n* https://medium.com/towards-data-science/deep-learning-gender-from-name-lstm-recurrent-neural-networks-448d64553044\n* https://github.com/spro/practical-pytorch/blob/master/char-rnn-classification/char-rnn-classification.ipynb\n* http://pytorch.org/tutorials/intermediate/char_rnn_generation_tutorial.html\n* http://karpathy.github.io/2015/05/21/rnn-effectiveness/\n* https://colah.github.io/posts/2015-08-Understanding-LSTMs/\n* https://cs231n.github.io/neural-networks-3/#baby\n* https://deeplearning4j.org/lstm.html\n* https://github.com/spro/practical-pytorch/blob/master/char-rnn-classification/char-rnn-classification.ipynb\n* https://github.com/karpathy/char-rnn\n* https://machinelearnings.co/text-classification-using-neural-networks-f5cd7b8765c6\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fellisbrown%2Fname2gender","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fellisbrown%2Fname2gender","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fellisbrown%2Fname2gender/lists"}