{"id":14977135,"url":"https://github.com/zionc27/speech-emotion-recognition","last_synced_at":"2025-05-09T01:13:54.357Z","repository":{"id":230780400,"uuid":"780154428","full_name":"ZionC27/Speech-Emotion-Recognition","owner":"ZionC27","description":"Speech Emotion Recognition (SER) using Deep neural networks CNN and RNN","archived":false,"fork":false,"pushed_at":"2025-04-22T20:55:34.000Z","size":32,"stargazers_count":5,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-09T01:13:48.203Z","etag":null,"topics":["clstm","cnn","ipython-notebook","keras","librosa","lstm","machine-learning","python","rnn","speech","speech-emotion-classification","speech-emotion-recognition","tensorflow"],"latest_commit_sha":null,"homepage":"https://huggingface.co/spaces/ZionC27/Speech-Emotion-Recognition","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ZionC27.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-03-31T21:04:51.000Z","updated_at":"2025-04-22T20:55:38.000Z","dependencies_parsed_at":"2025-04-22T21:40:53.314Z","dependency_job_id":null,"html_url":"https://github.com/ZionC27/Speech-Emotion-Recognition","commit_stats":null,"previous_names":["zionc27/speech-emotion-recognition"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZionC27%2FSpeech-Emotion-Recognition","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZionC27%2FSpeech-Emotion-Recognition/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZionC27%2FSpeech-Emotion-Recognition/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ZionC27%2FSpeech-Emotion-Recognition/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ZionC27","download_url":"https://codeload.github.com/ZionC27/Speech-Emotion-Recognition/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253171272,"owners_count":21865297,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clstm","cnn","ipython-notebook","keras","librosa","lstm","machine-learning","python","rnn","speech","speech-emotion-classification","speech-emotion-recognition","tensorflow"],"created_at":"2024-09-24T13:55:10.653Z","updated_at":"2025-05-09T01:13:54.337Z","avatar_url":"https://github.com/ZionC27.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Speech Emotion Recognition (SER) using deep learning\n\nThis repository contains code and resources for a Speech Emotion Recognition (SER) project, aiming to build robust models for recognizing emotions in speech signals.\nThe project builds upon recent studies in SER, emphasizing the significance of deep learning methods and addressing limitations in existing datasets.\n\n## You can test out the Speech Emotion Recognition on my hugging face spaces here: https://huggingface.co/spaces/ZionC27/Speech-Emotion-Recognition\n\nDataset Description and Analysis:\nA comprehensive dataset was constructed by combining secondary datasets including Emotional Multimodal Actors Dataset [(CREMA-D)](https://github.com/CheyneyComputerScience/CREMA-D), [JL corpus](https://www.kaggle.com/datasets/tli725/jl-corpus), Toronto Emotional Speech Set [(TESS)](https://utoronto.scholaris.ca/collections/036db644-9790-4ed0-90cc-be1dfb8a4b66), [EmoV-DB](https://github.com/numediart/EmoV-DB), [ASVP-ESD](https://www.kaggle.com/datasets/dejolilandry/asvpesdspeech-nonspeech-emotional-utterances) (Speech and Non-Speech Emotional Sound), \nPublicly Available Emotional Speech Dataset [(ESD)](https://www.kaggle.com/datasets/dejolilandry/asvpesdspeech-nonspeech-emotional-utterances), Ryerson Audio-Visual Database of Emotional Speech and Song [(RAVDESS)](https://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotional-speech-audio), and a primary dataset Diverse Emotion Speech dataset - English (DESD-E) collected from friends and schoolmates. \nThis approach ensures diversity and richness in the dataset, contributing to the robustness of the emotion recognition models. \nThe decision to incorporate primary data stemmed from limitations observed in existing datasets, including a focus on specific sentences and accent variations.\nYou can check out [this](https://github.com/jim-schwoebel/voice_datasets?tab=readme-ov-file) repo for links to more datasets.\n\n# Feature Extraction Methods\n\nZero-Crossing Rate (ZCR): ZCR calculates the rate at which the audio signal changes its sign, providing insights into speech characteristics such as speech rate and energy distribution.\n\nRoot Mean Square (RMS): RMS quantifies the overall energy present in the speech signal, offering valuable information about speech intensity and loudness variations.\n\nMel Frequency Cepstrum Coefficient (MFCC): MFCC captures the spectral envelope of the speech signal, emphasizing perceptually relevant features related to speech timbre, pitch, and spectral shape.\n\n# Model\n\nThe project utilizes deep learning techniques for emotion classification, incorporating Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) such as Long Short-Term Memory (LSTM), Bidirectional Long Short-Term Memory (Bi-LSTM)\nand Gated Recurrent Units (GRUs). Various machine learning models are explored, including different LSTM variants and combinations of CNN and LSTM architectures. \nAmong the experimented models, the CLSTM (CNN + LSTM) architecture emerges as the top performer, achieving an impressive accuracy of 82.12% and a precision of 84.66%. This model effectively integrates CNN for spatial feature extraction and \nLSTM for temporal dependency modeling, allowing it to capture intricate patterns in the speech data. The model can be found here: https://huggingface.co/ZionC27/EMO_20_82\n\n# Future Work\n\nCollecting More Data for DES-D: Efforts will be directed toward expanding the private dataset used in this project. Additional data will be collected to augment the existing dataset, \nensuring better coverage of diverse emotional expressions and linguistic variations. This expanded dataset will contribute to enhancing the robustness and effectiveness of the emotion recognition models.\n\nModel Fine-Tuning: Further refinement of the SER models will involve fine-tuning of hyperparameters and architecture adjustments. This iterative process aims to optimize model performance and improve accuracy in emotion classification tasks.\n\nExploring More Emotions: The inclusion of additional emotional categories beyond the existing ones will be explored. This expansion will enable the SER system to recognize a wider spectrum of emotions, enhancing its capability to capture the nuances of human emotion.\n\nIncorporating Different Languages: Efforts will be made to incorporate speech samples from different languages into the dataset. Training and evaluating the models on multilingual data will enable emotion \nrecognition in diverse linguistic contexts, expanding the applicability of the SER system.\n\n## Setup\n\nPython: version 3.9 and above should work\n\nRequired libraries:\n```\npip install pandas\npip install numpy\npip install matplotlib\npip install seaborn\npip install tensorflow\npip install librosa\npip install tensorflow\npip install keras-tuner\n```\nAll audio file used for testing and training should be in Wav format\n\n## Licence \n\nLicenced under [MIT](https://opensource.org/license/mit)\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzionc27%2Fspeech-emotion-recognition","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzionc27%2Fspeech-emotion-recognition","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzionc27%2Fspeech-emotion-recognition/lists"}