https://github.com/dufourc1/speech_emotion_recognition
Use case for a summer intership at Visium, Speech emotion recognition based on the berlin database
https://github.com/dufourc1/speech_emotion_recognition
Last synced: about 2 months ago
JSON representation
Use case for a summer intership at Visium, Speech emotion recognition based on the berlin database
- Host: GitHub
- URL: https://github.com/dufourc1/speech_emotion_recognition
- Owner: dufourc1
- License: gpl-3.0
- Created: 2019-03-23T17:03:29.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2019-05-14T11:53:53.000Z (about 6 years ago)
- Last Synced: 2025-02-08T09:47:10.122Z (3 months ago)
- Language: Jupyter Notebook
- Size: 74.8 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Speech emotion recognition
Use case for a summer internship at Visium, Speech emotion recognition based on the Berlin database Emodb
The purpose of this project is to find a model to properly classify the emotion of the speaker given sound samples, so we can do it later live.
## Libraries used
We used the following libraries for this project, with Python 3.6.5Computational:
numpy (as np)
sklearn (scikit-learn)
torch
librosa
python_speech_features
pandas
pandas_profilingGraphical:
seaborn (as sns) (version 0.9.0)
matplotlib (as plt)## Prerequisites
To install some of the libraries mentioned before, please use the following command (Linux):
pip install librosa
pip install python_speech_features
pip install pandas-profilingThe folder structure has to be the following:
.
├── data # Data files, in .csv
└── wav
├── src # Source files
├── neural_networks.py
├── data_processing.py
└── utilities.py
├── results
├── plots
└── models
└── README.md## Implementations details
#### Report.ipynb
Import functions from `neural_networks.py`, `utilities.py`,`data_processing.py`
Jupyter notebook to describe our approach to solving this classification issue. Some parts may be slow to run (features extraction),
due to the short deadline, not everything is optimized. This is why there are files `.csv` saved with all relevant transformations
already computed#### data_processing.py
Take in charge all of the data pipeline, the features extraction and so on
#### results
Contains files saved for easy later used:
`features_exploration.html` profiling of the features
#### data
The folder containing the audio files is `wav`. The other folders are included for the sake of completeness and contain preprocessed data. We did not use them since we wanted computable features from the audio data only, not preprocessed by experts. Could be used to improve the accuracy of our model.## Model and accuracy
Our model is a simple neural network, fully connected, with four hidden layers and Relu activation function. Its prediction is globally of approximately 69%, and in each class individually range from 40% to 90%
## References:
**Emotion recognition from the human voice**. Parlak, Cevahir & Diri, Banu. (2013). 21st Signal Processing and Communications Applications Conference, SIU 2013. 1-4. 0.1109/SIU.2013.6531196**Speech Emotion Recognition: Methods and Cases Study**. Kerkeni, Leila & Serrestou, Youssef & Mbarki, Mohamed & Raoof, Kosai & Mahjoub, Mohamed. (2018). 175-182. 10.5220/0006611601750182
**Emotion Recognition from Speech using Discriminative Features**. Chandrasekar, Purnima & Chapaneri, Santosh & Jayaswal, Deepak. (2014). International Journal of Computer Applications. 101. 31-36. 10.5120/17775-8913
[Berlin emotional speech database](http://emodb.bilderbar.info/index-1024.html)
Librosa: Audio and music signal analysis in python.McFee, Brian, Colin Raffel, Dawen Liang, Daniel PW Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto
Hastie, T.; Tibshirani, R. & Friedman, J. (2001), The Elements of Statistical Learning, Springer New York Inc., New York, NY, USA.
## Author
* *Charles Dufour*