Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/aramshiva/babies
👶 A parser for every name listed on a Social Security Card between 1880-2023
https://github.com/aramshiva/babies
babies data datagov db graphs mysql names social-security social-security-data sql statistics stats
Last synced: 5 days ago
JSON representation
👶 A parser for every name listed on a Social Security Card between 1880-2023
- Host: GitHub
- URL: https://github.com/aramshiva/babies
- Owner: aramshiva
- Created: 2024-06-12T05:26:21.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-08-01T18:22:51.000Z (3 months ago)
- Last Synced: 2024-08-02T20:53:33.096Z (3 months ago)
- Topics: babies, data, datagov, db, graphs, mysql, names, social-security, social-security-data, sql, statistics, stats
- Language: Python
- Homepage:
- Size: 8.38 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
> [!NOTE]
> This does **not** include any social security numbers. The only data stored is the name, frequency, sex, year born
> This **is** public data given by the Social Security Adminstration# Babies
### A parser for every name listed on a social security card between 1880-2023.
*(Tabulated based on Social Security records as of March 3, 2024)*Your first question is probably why? to that I ask why not?
This data is pulled from the [US Social Security Administration's Baby Names from Social Security Card Applications - National Dataset](https://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-national-data).
This script will insert the data into a MySQL database with the following schema:
```
name VARCHAR(255),
sex CHAR(1),
amount INT,
year INT
```### Some things to keep in note:
- As of 2024 there are around 2,117,219 rows in the database.
- The data is stored in a folder called "names" in the same directory as this script.
- Names with 5 or less occurrences with the sex and year are defaulted to 5 by the SSA to protect privacy
- The sex is a single character, either "M" or "F" for Male or Female.
- The year is the year the person was born, NOT registered.
- The raw data is a folder. For each year of birth YYYY after 1879, we created a comma-delimited file called yobYYYY.txt.
Each record in the individual annual files has the format "name,sex,number," where name is 2 to 15
characters, sex is M (male) or F (female) and "number" is the number of occurrences of the name.
Each file is sorted first on sex and then on number of occurrences in descending order. When there is
a tie on the number of occurrences, names are listed in alphabetical order. This sorting makes it easy to
determine a name's rank. The first record for each sex has rank 1, the second record for each sex has
rank 2, and so forth.### Planned Features (when i get bored again):
- Add a new column for the state the name was registered/possibly create a new database to store the [state data](https://www.ssa.gov/oact/babynames/limits.html).
- Create a web interface to search for names and display the data.
- Graphs! Who doesn't love graphs?
- An exported db file for those who don't want to set up a MySQL server :D### Want to run yourself?
- Fill in the `.env` (use `.env.example` as a guide)
- Run `python3 main.py` (this can **take a couple days**, I suggest running it on a [Raspberry Pi](https://www.raspberrypi.com/) with [tmux](https://github.com/tmux/tmux) to run while terminal is closed
- Boom! Your mySQL database is now full with data, and a table with 4 columns: `name, sex, amount, year`