https://github.com/VaishaliJain/ethnicIA
"The Importance of being Ernest, Ekundayo, or Eswari: An Interpretable Machine Learning Approach to Name-based Ethnicity Classification" Authors: Vaishali Jain, Ted Enamorado, and Cynthia Rudin
https://github.com/VaishaliJain/ethnicIA
interpretability interpretable-machine-learning name-classification
Last synced: 5 months ago
JSON representation
"The Importance of being Ernest, Ekundayo, or Eswari: An Interpretable Machine Learning Approach to Name-based Ethnicity Classification" Authors: Vaishali Jain, Ted Enamorado, and Cynthia Rudin
- Host: GitHub
- URL: https://github.com/VaishaliJain/ethnicIA
- Owner: VaishaliJain
- Created: 2022-06-08T17:57:26.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2023-01-06T18:06:50.000Z (over 2 years ago)
- Last Synced: 2024-08-13T07:11:34.150Z (8 months ago)
- Topics: interpretability, interpretable-machine-learning, name-classification
- Language: R
- Homepage: https://hdsr.mitpress.mit.edu/pub/wgss79vu/release/2
- Size: 20.3 MB
- Stars: 5
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- jimsghstars - VaishaliJain/ethnicIA - "The Importance of being Ernest, Ekundayo, or Eswari: An Interpretable Machine Learning Approach to Name-based Ethnicity Classification" Authors: Vaishali Jain, Ted Enamorado, and Cynthia Rudin (R)
README
# ethnicIA

"The Importance of being Ernest, Ekundayo, or Eswari: An Interpretable Machine Learning Approach to Name-based Ethnicity Classification"
Authors: Vaishali Jain, Ted Enamorado, and Cynthia RudinCitation: Jain, V., Enamorado, T., & Rudin, C. (2022). The Importance of Being Ernest, Ekundayo, or Eswari: An Interpretable Machine Learning Approach to Name-Based Ethnicity Classification. Harvard Data Science Review, 4(3). https://doi.org/10.1162/99608f92.db1aba8b
# Data
You can download the datasets from NC and FL from here: https://users.cs.duke.edu/~cynthia/ethnicIA/Data/. The GA data is not publicly available, so we have created 3 processed training and test datasets using only NC and FL that can be useful for testing algorithms.
# Steps to replicate experiments, case study, and appendices
Step 1: Run Code/R/01_Create_Train_Features_Master_sparse.R and Code/R/01_Create_Train_Features_Master_UID.R to generate all training datasets
Step 2: Run Code/R/02_Create_Test_Features_Master_sparse.R and Code/R/02_Create_Test_Features_Master_UID.R to generate all test datasets
Step 3: Run function ethnicIA_model_training() in Code/python/ethnicIA_paper_results.py file to train all the required models
Step 4: Run functions corresponding to the respective experiment from Code/python/ethnicIA_paper_results.py file to replicate the results.Follow any instruction provided in the functions in the python file.
(Open up Code/python/ethnicIA_paper_results.py for clarification on this step.)# Replication for Section 3: Sensitivity of parameters for Indistinguishibility
Run Code/R/03_Create_Features_FLGA_multCuts.R and Code/R/03_Plot_multCuts.R to generate the contour plot shown in Figure 1.
