https://github.com/devssh/genderethnicitydetector
Tensorflow Keras based pre-trained model to predict gender and ethnicity of Indian Names using Python
https://github.com/devssh/genderethnicitydetector
andrew-ng ethnicity-classifier fuzzy gender-classification indian lstm lstm-model pandas pre-trained tensorflow tensorflow-models text-classification username-predictor
Last synced: 4 months ago
JSON representation
Tensorflow Keras based pre-trained model to predict gender and ethnicity of Indian Names using Python
- Host: GitHub
- URL: https://github.com/devssh/genderethnicitydetector
- Owner: devssh
- License: mit
- Created: 2018-09-24T12:16:56.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2018-10-06T09:19:04.000Z (about 7 years ago)
- Last Synced: 2025-04-03T15:21:45.228Z (6 months ago)
- Topics: andrew-ng, ethnicity-classifier, fuzzy, gender-classification, indian, lstm, lstm-model, pandas, pre-trained, tensorflow, tensorflow-models, text-classification, username-predictor
- Language: Jupyter Notebook
- Size: 89.9 MB
- Stars: 4
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: Readme.md
- License: LICENSE
Awesome Lists containing this project
README
# Gender and Ethnicity Predictor
[](https://opensource.org/licenses/MIT)Pretrained models on 60k synthesized usernames for gender using baby names/110k usernames for ethnicity using LSTM inspired by AndrewNg DeepLearning.ai which works best for Indian names with > 88% accuracy which can predict for huge datasets/scale using pandas + tensorflow(keras).
Save time training it yourself by reusing this pre-trained model straight from github to do fuzzy detection on usernames. For improved performance hire a Data Scientist to improve the model or use transfer learning to retrain it. Pull requests are welcome!
# Sample Output
```
0 RajSood25
1 mukta57cute
2 burnvipulkumarfire
3 priyasubramanium
array([[male, female]]) probability
[[0.71 0.28]
[0.11 0.88]
[0.99 0. ]
[0. 0.99]]
``````
0 RajkumarSood
1 mukta57
2 johnsmith
3 KeanuReeves
4 Elias
5 priyasubramanium
6 Devashisharray([[other, indian ethnicity]]) probability
[[0.09 0.9 ]
[0.33 0.66]
[0.81 0.18]
[0.88 0.11]
[0.96 0.03]
[0.05 0.94]
[0.07 0.92]]
```Modify the sample names in the python script with your own text and try!
Requirements
```
Python 3.6
Pandas
Numpy
Tensorflow 1.10
```View in [Browser](https://github.com/devssh/GenderEthnicityDetector/blob/master/PredictGenderAndEthnicity.ipynb) or Run
```
python predict_gender_and_ethnicity.py
```You can retrain the model to work for gender and ethnicity of any country using the string_vectorizer. Need GPUs to train for worldwide.
Star this repo to add to your favorite repos or fork the code!
If you have better datasets that I can train on do share!