https://github.com/VaishaliJain/ethnicIA

"The Importance of being Ernest, Ekundayo, or Eswari: An Interpretable Machine Learning Approach to Name-based Ethnicity Classification" Authors: Vaishali Jain, Ted Enamorado, and Cynthia Rudin
https://github.com/VaishaliJain/ethnicIA

interpretability interpretable-machine-learning name-classification

Last synced: 7 months ago
JSON representation

"The Importance of being Ernest, Ekundayo, or Eswari: An Interpretable Machine Learning Approach to Name-based Ethnicity Classification" Authors: Vaishali Jain, Ted Enamorado, and Cynthia Rudin

Host: GitHub
URL: https://github.com/VaishaliJain/ethnicIA
Owner: VaishaliJain
Created: 2022-06-08T17:57:26.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2023-01-06T18:06:50.000Z (over 2 years ago)
Last Synced: 2024-08-13T07:11:34.150Z (10 months ago)
Topics: interpretability, interpretable-machine-learning, name-classification
Language: R
Homepage: https://hdsr.mitpress.mit.edu/pub/wgss79vu/release/2
Size: 20.3 MB
Stars: 5
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

jimsghstars - VaishaliJain/ethnicIA - "The Importance of being Ernest, Ekundayo, or Eswari: An Interpretable Machine Learning Approach to Name-based Ethnicity Classification" Authors: Vaishali Jain, Ted Enamorado, and Cynthia Rudin (R)

README

# ethnicIA

![](/Images/ethnicIA_logo.png?raw=true)

"The Importance of being Ernest, Ekundayo, or Eswari: An Interpretable Machine Learning Approach to Name-based Ethnicity Classification"
Authors: Vaishali Jain, Ted Enamorado, and Cynthia Rudin

Citation: Jain, V., Enamorado, T., & Rudin, C. (2022). The Importance of Being Ernest, Ekundayo, or Eswari: An Interpretable Machine Learning Approach to Name-Based Ethnicity Classification. Harvard Data Science Review, 4(3). https://doi.org/10.1162/99608f92.db1aba8b

# Data

You can download the datasets from NC and FL from here: https://users.cs.duke.edu/~cynthia/ethnicIA/Data/. The GA data is not publicly available, so we have created 3 processed training and test datasets using only NC and FL that can be useful for testing algorithms.

# Steps to replicate experiments, case study, and appendices

Step 1: Run Code/R/01_Create_Train_Features_Master_sparse.R and Code/R/01_Create_Train_Features_Master_UID.R to generate all training datasets
Step 2: Run Code/R/02_Create_Test_Features_Master_sparse.R and Code/R/02_Create_Test_Features_Master_UID.R to generate all test datasets
Step 3: Run function ethnicIA_model_training() in Code/python/ethnicIA_paper_results.py file to train all the required models
Step 4: Run functions corresponding to the respective experiment from Code/python/ethnicIA_paper_results.py file to replicate the results.

Follow any instruction provided in the functions in the python file.
(Open up Code/python/ethnicIA_paper_results.py for clarification on this step.)

# Replication for Section 3: Sensitivity of parameters for Indistinguishibility

Run Code/R/03_Create_Features_FLGA_multCuts.R and Code/R/03_Plot_multCuts.R to generate the contour plot shown in Figure 1.

![Namespace](/Images/Namespace.png?raw=true "Namespace")

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/VaishaliJain/ethnicIA

Awesome Lists containing this project

README