{"id":27771674,"url":"https://github.com/jasonkessler/agefromname","last_synced_at":"2025-04-29T22:59:11.177Z","repository":{"id":57408511,"uuid":"80049125","full_name":"JasonKessler/agefromname","owner":"JasonKessler","description":"Predict age and gender from a first name","archived":false,"fork":false,"pushed_at":"2018-09-25T05:34:05.000Z","size":7839,"stargazers_count":60,"open_issues_count":0,"forks_count":11,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-04-29T22:59:05.649Z","etag":null,"topics":["age","census-data","data-science","demographics","demography","first-names","gender","names","python","python-3","python-api","social-security-data","statistics"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JasonKessler.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-01-25T19:25:24.000Z","updated_at":"2024-08-12T21:41:39.000Z","dependencies_parsed_at":"2022-09-26T22:30:23.572Z","dependency_job_id":null,"html_url":"https://github.com/JasonKessler/agefromname","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JasonKessler%2Fagefromname","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JasonKessler%2Fagefromname/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JasonKessler%2Fagefromname/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JasonKessler%2Fagefromname/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JasonKessler","download_url":"https://codeload.github.com/JasonKessler/agefromname/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251596666,"owners_count":21615017,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["age","census-data","data-science","demographics","demography","first-names","gender","names","python","python-3","python-api","social-security-data","statistics"],"created_at":"2025-04-29T22:59:10.535Z","updated_at":"2025-04-29T22:59:11.170Z","avatar_url":"https://github.com/JasonKessler.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AgeFromName 0.0.8\nA tool for predicting someone's age, gender, or generation given their name and assigned sex at birth, \nassuming they were born in the US.\n\nFeel free to use the Gitter community [gitter.im/agefromname](https://gitter.im/agefromname/Lobby) for help or to discuss the project.   \n\n## Installation\n\n`$ pip install agefromname`\n\n## Overview\n\nThis more or less apes the approach of FiveThirtyEight's [\"How to Tell if Someone's Age\nWhen All you Know Is Her Name\"](https://fivethirtyeight.com/features/how-to-tell-someones-age-when-all-you-know-is-her-name/) article.\n \nIt includes data collected scraped from the Social Security \nAdministration's [Life Tables for the United States Social Security Area 1900-2100](https://www.ssa.gov/oact/NOTES/as120/LifeTables_Body.html#wp1168591)\n and their [baby names data](http://www.ssa.gov/oact/babynames/names.zip). Code is included\n to re-scrape and refresh this data in `regenerate_data.py`.  It includes data as far back as\n 1981.\n\nTo use, first initialize the finder\n\n```pythonstub\n\u003e\u003e\u003e from agefromname import AgeFromName\n\u003e\u003e\u003e age_from_name = AgeFromName()\n```\n\nYou find the probability of someone's gender based on their first name and optionally,\n the current year, their minimum age, and/or their maximum age.  \n\n```pythonstub\n\u003e\u003e\u003e age_from_name.prob_male('taylor')\n0.24956599946849847\n\u003e\u003e\u003e age_from_name.prob_female('taylor')\n0.7504340005315016\n\u003e\u003e\u003e age_from_name.prob_male('taylor', minimum_age=50)\n0.9572157723373936\n\u003e\u003e\u003e age_from_name.prob_male('taylor', current_year=1930)\n1.0\n\u003e\u003e\u003e age_from_name.prob_male('taylor', current_year=2010, minimum_age=30)\n0.8497712563439375\n\u003e\u003e\u003e age_from_name.prob_male('taylor', current_year=2010, minimum_age=30, maximum_age=40)\n0.7645011554551521\n```\n\nYou can even plot the plot, given a current year, the probability someone named Kelsey would be female:\n \n```pythonstub\n\u003e\u003e\u003e (pd.DataFrame([{'year': year, \n                'P(Male)': age_from_name.prob_male('kelsey', current_year=year)}\n               for year in range(1930, 2015)])\n .set_index('year')\n .plot())\n```\n![The decreasing probability Kelsey is a male](https://jasonkessler.github.io/kelseyplot.png)\n\nOne can perform this computation in bulk for all names.  Here, we can see a 95% confidence intervals\n of how likely people over 18 in 1993 were females given their names:\n```pythonstub\n\u003e\u003e\u003e age_from_name.get_all_name_female_prob(current_year=1993, minimum_age=18).iloc[:3]\n                  hi        lo  prob\nfirst_name\naage        0.648197  0.000000   0.0\naagot       1.000000  0.380786   1.0\naamir       0.398189  0.000000   0.0\n```\n\nNow you can use this to get the mode of someone's age, give their first name and \ngender.  Note that their gender should be a single letter, 'm' or 'f' (case-insensitive), and that the\n  first name is case-insensitive as well.\n  \n```pythonstub\n\u003e\u003e\u003e age_from_name.argmax('jAsOn', 'm')\n1977\n\u003e\u003e\u003e age_from_name.argmax('Jason', 'M')\n1977\n```\n\nYou can also include an \"as-of\" year.  For example, in 1980, the argmax year for \"John\" was 1947, while in 2000 it was 1964.  Note that if omitted, the current year is used.\n\n```pythonstub\n\u003e\u003e\u003e age_from_name.argmax('john', 'm', 2000)\n1964\n\u003e\u003e\u003e age_from_name.argmax('john', 'm', 1980)\n1947\n```\n\nFurthermore, you can exclude people who are younger than a particular age.  \n```pythonstub\n\u003e\u003e\u003e age_from_name.argmax('bill', 'm', 1980, minimum_age=40)\n1934\n\u003e\u003e\u003e age_from_name.argmax('bill', 'm', minimum_age=40)\n1959\n```\n\nGetting estimated counts of living people with a giving name and gender at a particular date is easy, \nand given in a Pandas Series.\n```pythonstub\n\u003e\u003e\u003e age_from_name.get_estimated_counts('john', 'm', 1960)\nyear_of_birth\n1881     4613.792420\n1882     5028.397099\n1883     4679.560929\n...\n```\n\nWe can see corresponding probability distribution using\n\n```pythonstub\n\u003e\u003e\u003e age_from_name.get_estimated_distribution('mary', 'f', 1910)\nyear_of_birth\n1881    0.016531\n1882    0.019468\n1883    0.019143\n...\n```\n\nFinally, we can see similar information for generations, as well, using the GenerationFromName class.\n```pythonstub\n\u003e\u003e\u003e from agefromname import GenerationFromName\n\u003e\u003e\u003e generation_from_name = GenerationFromName()\n\u003e\u003e\u003e generation_from_name.argmax('barack', 'm')\n'Generation Z'\n\u003e\u003e\u003e generation_from_name.argmax('ashley', 'f')\n'Millenials'\n\u003e\u003e\u003e generation_from_name.argmax('monica', 'f')\n'Generation X'\n\u003e\u003e\u003e generation_from_name.argmax('bill', 'm')\n'Baby Boomers\n\u003e\u003e\u003e generation_from_name.argmax('wilma', 'f')\n'Silent'\n\u003e\u003e\u003e generation_from_name.get_estimated_distribution('jaden', 'm')\nBaby Boomers           0.000000\nGeneration X           0.001044\nGeneration Z           0.897662\nGreatest Generation    0.000000\nMillenials             0.101294\n_other                 0.000000\nName: estimate_percentage, dtype: float64\n\u003e\u003e\u003e generation_from_name.get_estimated_distribution('gertrude', 'f')\nBaby Boomers           0.259619\nGeneration X           0.031956\nGeneration Z           0.009742\nGreatest Generation    0.425293\nMillenials             0.011412\n_other                 0.261979\n\u003e\u003e\u003e generation_from_name.get_estimated_counts('ashley', 'f')\nBaby Boomers              702.481287\nGeneration X            29274.206090\nGeneration Z           141195.016621\nGreatest Generation        34.998913\nMillenials             652914.233604\n_other                      0.102625\nName: estimated_count, dtype: float64\n```\n\n## Caveat Usor\nThe Social Security Administration records the 1,000 most common male and female baby names + birth counts each year.  These may not be fully representative of the entire population, and may not work as well for people whose names aren't historically common among those born in the US or other groups. \n\nBe aware that there are people who have near-dogmatic objections to this sort of analysis, especially using a first name to impute a gender.  \n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjasonkessler%2Fagefromname","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjasonkessler%2Fagefromname","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjasonkessler%2Fagefromname/lists"}