{"id":19437749,"url":"https://github.com/renovamen/speech-emotion-recognition","last_synced_at":"2025-05-16T16:09:21.698Z","repository":{"id":36081526,"uuid":"180302915","full_name":"Renovamen/Speech-Emotion-Recognition","owner":"Renovamen","description":"Speech emotion recognition implemented in Keras (LSTM, CNN, SVM, MLP) | 语音情感识别","archived":false,"fork":false,"pushed_at":"2023-03-25T01:20:48.000Z","size":155549,"stargazers_count":1120,"open_issues_count":33,"forks_count":217,"subscribers_count":14,"default_branch":"master","last_synced_at":"2025-04-12T14:59:15.981Z","etag":null,"topics":["cnn","lstm","mlp","opensmile","speech-emotion-recognition","svm"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Renovamen.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-04-09T06:47:32.000Z","updated_at":"2025-04-10T11:52:19.000Z","dependencies_parsed_at":"2023-01-16T13:01:20.605Z","dependency_job_id":"0f18ca2a-6621-4627-a25a-270652e07d56","html_url":"https://github.com/Renovamen/Speech-Emotion-Recognition","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Renovamen%2FSpeech-Emotion-Recognition","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Renovamen%2FSpeech-Emotion-Recognition/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Renovamen%2FSpeech-Emotion-Recognition/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Renovamen%2FSpeech-Emotion-Recognition/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Renovamen","download_url":"https://codeload.github.com/Renovamen/Speech-Emotion-Recognition/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254564127,"owners_count":22092122,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cnn","lstm","mlp","opensmile","speech-emotion-recognition","svm"],"created_at":"2024-11-10T15:15:46.183Z","updated_at":"2025-05-16T16:09:16.680Z","avatar_url":"https://github.com/Renovamen.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Speech Emotion Recognition \n\n用 LSTM、CNN、SVM、MLP 进行语音情感识别，Keras 实现。\n\n改进了特征提取方式，识别准确率提高到了 80% 左右。原来的版本的存档在 [First-Version 分支](https://github.com/Renovamen/Speech-Emotion-Recognition/tree/First-Version)。\n\n[English Document](README_EN.md) | 中文文档\n\n\n\u0026nbsp;\n\n## Environments\n\n- Python 3.8\n- Keras \u0026 TensorFlow 2\n\n\n\u0026nbsp;\n\n## Structure\n\n```\n├── models/                // 模型实现\n│   ├── common.py          // 所有模型的基类\n│   ├── dnn                // 神经网络模型\n│   │   ├── dnn.py         // 所有神经网络模型的基类\n│   │   ├── cnn.py         // CNN\n│   │   └── lstm.py        // LSTM\n│   └── ml.py              // SVM \u0026 MLP\n├── extract_feats/         // 特征提取\n│   ├── librosa.py         // librosa 提取特征\n│   └── opensmile.py       // Opensmile 提取特征\n├── utils/\n│   ├── files.py           // 用于整理数据集（分类、批量重命名）\n│   ├── opts.py            // 使用 argparse 从命令行读入参数\n│   └── plot.py            // 绘图（雷达图、频谱图、波形图）\n├── config/                // 配置参数（.yaml）\n├── features/              // 存储提取好的特征\n├── checkpoints/           // 存储训练好的模型权重\n├── train.py               // 训练模型\n├── predict.py             // 用训练好的模型预测指定音频的情感\n└── preprocess.py          // 数据预处理（提取数据集中音频的特征并保存）\n```\n\n\n\u0026nbsp;\n\n## Requirments\n\n### Python\n\n- [TensorFlow 2](https://github.com/tensorflow/tensorflow) / [Keras](https://github.com/keras-team/keras)：LSTM \u0026 CNN (`tensorflow.keras`)\n- [scikit-learn](https://github.com/scikit-learn/scikit-learn)：SVM \u0026 MLP 模型，划分训练集和测试集\n- [joblib](https://github.com/joblib/joblib)：保存和加载用 scikit-learn 训练的模型\n- [librosa](https://github.com/librosa/librosa)：提取特征、波形图\n- [SciPy](https://github.com/scipy/scipy)：频谱图\n- [pandas](https://github.com/pandas-dev/pandas)：加载特征\n- [Matplotlib](https://github.com/matplotlib/matplotlib)：绘图\n- [NumPy](https://github.com/numpy/numpy)\n\n### Tools\n\n- [可选] [Opensmile](https://github.com/naxingyu/opensmile)：提取特征\n\n\n\u0026nbsp;\n\n## Datasets\n\n1. [RAVDESS](https://zenodo.org/record/1188976)\n\n   英文，24 个人（12 名男性，12 名女性）的大约 1500 个音频，表达了 8 种不同的情绪（第三位数字表示情绪类别）：01 = neutral，02 = calm，03 = happy，04 = sad，05 = angry，06 = fearful，07 = disgust，08 = surprised。\n\n2. [SAVEE](http://kahlan.eps.surrey.ac.uk/savee/Download.html)\n\n   英文，4 个人（男性）的大约 500 个音频，表达了 7 种不同的情绪（第一个字母表示情绪类别）：a = anger，d = disgust，f = fear，h = happiness，n = neutral，sa = sadness，su = surprise。\n\n3. [EMO-DB](http://www.emodb.bilderbar.info/download/)\n\n   德语，10 个人（5 名男性，5 名女性）的大约 500 个音频，表达了 7 种不同的情绪（倒数第二个字母表示情绪类别）：N = neutral，W = angry，A = fear，F = happy，T = sad，E = disgust，L = boredom。\n\n4. CASIA\n\n   汉语，4 个人（2 名男性，2 名女性）的大约 1200 个音频，表达了 6 种不同的情绪：neutral，happy，sad，angry，fearful，surprised。\n\n\n\u0026nbsp;\n\n## Usage\n\n### Prepare\n\n安装依赖：\n\n```python\npip install -r requirements.txt\n```\n\n（可选）安装 [Opensmile](https://github.com/naxingyu/opensmile)。\n\n\u0026nbsp;\n\n### Configuration\n\n在 [`configs/`](https://github.com/Renovamen/Speech-Emotion-Recognition/tree/master/configs) 文件夹中的配置文件（YAML）里配置参数。\n\n其中 Opensmile 标准特征集目前只支持：\n\n- `IS09_emotion`：[The INTERSPEECH 2009 Emotion Challenge](http://mediatum.ub.tum.de/doc/980035/292947.pdf)，384 个特征；\n- `IS10_paraling`：[The INTERSPEECH 2010 Paralinguistic Challenge](https://sail.usc.edu/publications/files/schuller2010_interspeech.pdf)，1582 个特征；\n- `IS11_speaker_state`：[The INTERSPEECH 2011 Speaker State Challenge](https://www.phonetik.uni-muenchen.de/forschung/publikationen/Schuller-IS2011.pdf)，4368 个特征；\n- `IS12_speaker_trait`：[The INTERSPEECH 2012 Speaker Trait Challenge](http://www5.informatik.uni-erlangen.de/Forschung/Publikationen/2012/Schuller12-TI2.pdf)，6125 个特征；\n- `IS13_ComParE`：[The INTERSPEECH 2013 ComParE Challenge](http://www.dcs.gla.ac.uk/~vincia/papers/compare.pdf)，6373 个特征；\n- `ComParE_2016`：[The INTERSPEECH 2016 Computational Paralinguistics Challenge](http://www.tangsoo.de/documents/Publications/Schuller16-TI2.pdf)，6373 个特征。\n\n如果需要用其他特征集，可以自行修改 [`extract_feats/opensmile.py`](extract_feats/opensmile.py) 中的 `FEATURE_NUM` 项。\n\n\u0026nbsp;\n\n### Preprocess\n\n首先需要提取数据集中音频的特征并保存到本地。Opensmile 提取的特征会被保存在 `.csv` 文件中，librosa 提取的特征会被保存在 `.p` 文件中。\n\n```python\npython preprocess.py --config configs/example.yaml\n```\n其中，`configs/example.yaml` 是你的配置文件路径。\n\n\u0026nbsp;\n\n### Train\n\n数据集路径可以在 [`configs/`](configs) 中配置，相同情感的音频放在同一个文件夹里（可以参考 [`utils/files.py`](utils/files.py) 整理数据），如：\n\n```\n└── datasets\n    ├── angry\n    ├── happy\n    ├── sad\n    ...\n```\n\n然后：\n\n```python\npython train.py --config configs/example.yaml\n```\n\n\u0026nbsp;\n\n### Predict\n\n用训练好的模型来预测指定音频的情感。[`checkpoints/`](checkpoints)里有一些已经训练好的模型。\n\n```python\npython predict.py --config configs/example.yaml\n```\n\n\n\u0026nbsp;\n\n### Functions\n\n#### Radar Chart\n\n画出预测概率的雷达图。\n\n来源：[Radar](https://github.com/Zhaofan-Su/SpeechEmotionRecognition/blob/master/leidatu.py)\n\n```python\nimport utils\n\n\"\"\"\nArgs:\n    data_prob (np.ndarray): 概率数组\n    class_labels (list): 情感标签\n\"\"\"\nutils.radar(data_prob, class_labels)\n```\n\n\u0026nbsp;\n\n#### Play Audio\n\n播放一段音频\n\n```python\nimport utils\n\nutils.play_audio(file_path)\n```\n\n\u0026nbsp;\n\n#### Plot Curve\n\n画训练过程的准确率曲线和损失曲线。\n\n```python\nimport utils\n\n\"\"\"\nArgs:\n    train (list): 训练集损失值或准确率数组\n    val (list): 测试集损失值或准确率数组\n    title (str): 图像标题\n    y_label (str): y 轴标题\n\"\"\"\nutils.curve(train, val, title, y_label)\n```\n\n\u0026nbsp;\n\n#### Waveform\n\n画出音频的波形图。\n\n```python\nimport utils\n\nutils.waveform(file_path)\n```\n\n\u0026nbsp;\n\n#### Spectrogram\n\n画出音频的频谱图。\n\n```python\nimport utils\n\nutils.spectrogram(file_path)\n```\n\n\n\u0026nbsp;\n\n## Other Contributors\n\n- [@Zhaofan-Su](https://github.com/Zhaofan-Su)\n- [@Guo Hui](https://github.com/guohui15661353950)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frenovamen%2Fspeech-emotion-recognition","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frenovamen%2Fspeech-emotion-recognition","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frenovamen%2Fspeech-emotion-recognition/lists"}