An open API service indexing awesome lists of open source software.

https://github.com/aramshiva/babies

👶 A parser for every name listed on a Social Security Card between 1880-2023
https://github.com/aramshiva/babies

babies data datagov db graphs mysql names social-security social-security-data sql statistics stats

Last synced: 10 months ago
JSON representation

👶 A parser for every name listed on a Social Security Card between 1880-2023

Awesome Lists containing this project

README

          

> [!WARNING]
> As of March 16th 2025, this repo is not maintained, it has been merged into the [`names` repo](https://github.com/aramshiva/names) in the `sql` folder.

> [!NOTE]
> This does **not** include any social security numbers. The only data stored is the name, frequency, sex, year born
> This **is** public data given by the Social Security Administration

# Babies
### A parser for every name listed on a social security card between 1880-2023.
*(Tabulated based on Social Security records as of March 3, 2024)*

Your first question is probably why? to that I ask why not?

This data is pulled from the [US Social Security Administration's Baby Names from Social Security Card Applications - National Dataset](https://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-national-data).
This script will insert the data into a MySQL database with the following schema:
```
name VARCHAR(255),
sex CHAR(1),
amount INT,
year INT
```

### Some things to keep in note:
- As of 2024 there are around 2,117,219 rows in the database.
- The data is stored in a folder called "names" in the same directory as this script.
- Names with 5 or less occurrences with the sex and year are defaulted to 5 by the SSA to protect privacy
- The sex is a single character, either "M" or "F" for Male or Female.
- The year is the year the person was born, NOT registered.
- The raw data is a folder. For each year of birth YYYY after 1879, we created a comma-delimited file called yobYYYY.txt.
Each record in the individual annual files has the format "name,sex,number," where name is 2 to 15
characters, sex is M (male) or F (female) and "number" is the number of occurrences of the name.
Each file is sorted first on sex and then on number of occurrences in descending order. When there is
a tie on the number of occurrences, names are listed in alphabetical order. This sorting makes it easy to
determine a name's rank. The first record for each sex has rank 1, the second record for each sex has
rank 2, and so forth.

### Want to run yourself?
- Fill in the `.env` (use `.env.example` as a guide)
- Run `python3 main.py`
- Boom! Your mySQL database is now full with data, and a table with 4 columns: `name, sex, amount, year`