Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/joaoalvarenga/namegenderclassifier

Gender classification by name
https://github.com/joaoalvarenga/namegenderclassifier

brasil brazil classificador classifier gender genero keras mlp neural-network nome perceptron portugues portuguese

Last synced: 3 months ago
JSON representation

Gender classification by name

Awesome Lists containing this project

README

        

# Name Gender Classifier
![built with](https://img.shields.io/badge/Built%20with-Python%203-green.svg)

A gender classifier based on first names.
This classifier implements a single layer perceptron as main classifier.
It uses name's last 3-gram and character frequency as features into the classifier.

## Dataset
With brazilian names dataset, my current numbers are:
```
Accuracy: 0.759988
Precision: 0.753677
Recall: 0.756184
F1: 0.754929
```

# Quick start
## Requirements
This project uses Python 3 specifications
Install all project dependencies via pip after cloning project
```
$ python setup.py install
$ pip install -r requirements.txt
```
## Training example
```python
from genderclassifier import GenderClassifier
import pandas as pd

dataset = pd.read_csv("data/nomes.csv").values

classifier = GenderClassifier()
classifier.train(dataset)
classifier.save("models/example")
precision, recall, accuracy, f1 = classifier.evaluate(dataset)
print("Accuracy: %f" % accuracy)
print("Precision: %f" % precision)
print("Recall: %f" % recall)
print("F1: %f" % f1)
```

## Predicting
```python
from genderclassifier import GenderClassifier
classifier = GenderClassifier()
classifier.load("models/example")
name = input()
while name is not "q":
pred = classifier.predict([name])
print("%s - %s" % (name, pred))
name = input()
```

# License
MIT License

# Contributing

:+1::tada: First off, thanks for taking the time to contribute! :tada::+1:

Steps to contribute:

- Make your awesome changes
- Submit pull request
- You can also help sharing better datasets ;)