{"id":13586772,"url":"https://github.com/primaryobjects/voice-gender","last_synced_at":"2025-04-06T13:11:38.769Z","repository":{"id":41447423,"uuid":"60781263","full_name":"primaryobjects/voice-gender","owner":"primaryobjects","description":"Gender recognition by voice and speech analysis","archived":false,"fork":false,"pushed_at":"2023-01-16T15:50:20.000Z","size":6486,"stargazers_count":341,"open_issues_count":0,"forks_count":102,"subscribers_count":37,"default_branch":"master","last_synced_at":"2024-10-29T22:49:32.518Z","etag":null,"topics":["acoustic-properties","ai","artificial-intelligence","data-science","gender","gender-recognition","logistic-regression","machine-learning","neural-network","signal","speech","vocal","voice"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/primaryobjects.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null},"funding":{"github":["primaryobjects"]}},"created_at":"2016-06-09T14:30:44.000Z","updated_at":"2024-10-28T12:05:54.000Z","dependencies_parsed_at":"2023-02-10T04:31:37.141Z","dependency_job_id":null,"html_url":"https://github.com/primaryobjects/voice-gender","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/primaryobjects%2Fvoice-gender","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/primaryobjects%2Fvoice-gender/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/primaryobjects%2Fvoice-gender/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/primaryobjects%2Fvoice-gender/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/primaryobjects","download_url":"https://codeload.github.com/primaryobjects/voice-gender/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247485290,"owners_count":20946398,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["acoustic-properties","ai","artificial-intelligence","data-science","gender","gender-recognition","logistic-regression","machine-learning","neural-network","signal","speech","vocal","voice"],"created_at":"2024-08-01T15:05:48.137Z","updated_at":"2025-04-06T13:11:38.749Z","avatar_url":"https://github.com/primaryobjects.png","language":"R","readme":"Voice Gender\n------------\n\nGender Recognition by Voice and Speech Analysis\n\nRun the online [demo](https://voicegender.herokuapp.com).\n\nRead the full article [Identifying the Gender of a Voice using Machine Learning](http://www.primaryobjects.com/2016/06/22/identifying-the-gender-of-a-voice-using-machine-learning/).\n\nThis project trains a computer program to identify a voice as male or female, based upon acoustic properties of the voice and speech. The model is trained on a dataset consisting of 3,168 recorded voice samples, collected from male and female speakers. The voice samples are pre-processed by acoustic analysis in R and then processed with artificial intelligence/machine learning algorithms to learn gender-specific traits for classifying the voice as male or female.\n\nThe best model achieves an accuracy of 100% on the training set and 89% on the test set.\n\n**Update: By narrowing the frequency range analyzed to 0hz-280hz ([human vocal range](https://en.wikipedia.org/wiki/Voice_frequency#Fundamental_frequency)), the best accuracy is boosted to 100%/99%.**\n\n## The Dataset\n\nDownload the pre-processed [dataset](https://raw.githubusercontent.com/primaryobjects/voice-gender/master/voice.csv) as a CSV file.\n\nThe CSV file contains the following fields:\n\n\"meanfreq\",\"sd\",\"median\",\"Q25\",\"Q75\",\"IQR\",\"skew\",\"kurt\",\"sp.ent\",\"sfm\",\"mode\",\"centroid\",\"meanfun\",\"minfun\",\"maxfun\",\"meandom\",\"mindom\",\"maxdom\",\"dfrange\",\"modindx\",\"label\"\n\n\"label\" corresponds to the gender classification of the sample. The remaining fields are acoustic properties, detailed [below](#acoustic-properties-measured).\n\nIn R, you can load the dataset file [data.bin](https://raw.githubusercontent.com/primaryobjects/voice-gender/master/data.bin) directly as a data.frame with the command ```load('data.bin')```.\n\nIn addition to the pre-processed dataset, the raw voice samples used for training are included as .WAV files in a separate repository. The .WAV files are pre-processed in R to produce the above dataset.\n\n## Accuracy\n\nThe trained models have achieved the following accuracies (train/test):\n\n#### Baseline Algorithm (always male)\n50%/50%\n\n#### Baseline Algorithm (simple frequency threshold)\n61%/59%\n\n#### Logistic Regression\n72%/71%\n\n#### Classification and Regression Tree (CART)\n81%/78%\n\n#### Random Forest\n100%/87%\n\n#### Generalized Boosted Tree Regression\n91%/84%\n\n#### XGBoost\n100%/87%\n\n#### XGBoost (Updated with frequency range 0hz-280hz)\n100%/99%\n\n## Acoustic Properties Measured\n\nThe following acoustic properties of each voice are measured:\n\n- **duration**: length of signal\n- **meanfreq**: mean frequency (in kHz)\n- **sd**: standard deviation of frequency\n- **median**: median frequency (in kHz)\n- **Q25**: first quantile (in kHz)\n- **Q75**: third quantile (in kHz)\n- **IQR**: interquantile range (in kHz)\n- **skew**: skewness (see note in specprop description)\n- **kurt**: kurtosis (see note in specprop description)\n- **sp.ent**: spectral entropy\n- **sfm**: spectral flatness\n- **mode**: mode frequency\n- **centroid**: frequency centroid (see specprop)\n- **peakf**: peak frequency (frequency with highest energy)\n- **meanfun**: average of fundamental frequency measured across acoustic signal\n- **minfun**: minimum fundamental frequency measured across acoustic signal\n- **maxfun**: maximum fundamental frequency measured across acoustic signal\n- **meandom**: average of dominant frequency measured across acoustic signal\n- **mindom**: minimum of dominant frequency measured across acoustic signal\n- **maxdom**: maximum of dominant frequency measured across acoustic signal\n- **dfrange**: range of dominant frequency measured across acoustic signal\n- **modindx**: modulation index. Calculated as the accumulated absolute difference between adjacent measurements of fundamental frequencies divided by the frequency range\n\n## Classification and Regression Decision Tree\n\nThe following decision tree, produced by the CART model, provides a high-level overview of important properties of the voice samples that may determine a specific gender classification of male versus female.\n\n![Screenshot 1](https://raw.githubusercontent.com/primaryobjects/voice-gender/master/images/voice-plot-1.png)\n\nAfter narrowing the frequency range to 0hz-280hz with a sound threshold of 15%, the accuracy is boosted to near perfect, and the following CART model is described. Mean fundamental frequency serves as a powerful indicator of voice gender, with a threshold of 140hz separating male from female classifications.\n\n![Screenshot 2](https://raw.githubusercontent.com/primaryobjects/voice-gender/master/images/voice-plot-2.png)\n\n## References\n\n[The Harvard-Haskins Database of Regularly-Timed Speech](http://www.nsi.edu/~ani/download.html)\n\n[Telecommunications \u0026 Signal Processing Laboratory (TSP) Speech Database at McGill University](http://www-mmsp.ece.mcgill.ca/Documents../Downloads/TSPspeech/TSPspeech.pdf), [Home](http://www-mmsp.ece.mcgill.ca/Documents../Data/index.html)\n\n[VoxForge Speech Corpus](http://www.repository.voxforge1.org/downloads/SpeechCorpus/Trunk/Audio/Main/8kHz_16bit/), [Home](http://www.voxforge.org)\n\n[Festvox CMU_ARCTIC Speech Database at Carnegie Mellon University](http://festvox.org/cmu_arctic/)\n\n## Copyright\n\nCopyright (c) 2022 Kory Becker http://primaryobjects.com/kory-becker\n\n## Author\n\nKory Becker\nhttp://www.primaryobjects.com\n","funding_links":["https://github.com/sponsors/primaryobjects"],"categories":["R"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprimaryobjects%2Fvoice-gender","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fprimaryobjects%2Fvoice-gender","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprimaryobjects%2Fvoice-gender/lists"}