{"id":21299842,"url":"https://github.com/ddlbojack/emotion2vec","last_synced_at":"2025-05-15T07:07:17.311Z","repository":{"id":211860360,"uuid":"723282884","full_name":"ddlBoJack/emotion2vec","owner":"ddlBoJack","description":"[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation","archived":false,"fork":false,"pushed_at":"2024-12-23T06:54:27.000Z","size":10263,"stargazers_count":795,"open_issues_count":17,"forks_count":57,"subscribers_count":16,"default_branch":"main","last_synced_at":"2025-04-14T13:04:26.692Z","etag":null,"topics":["iemocap","pytorch-implementation","speech-emotion-recognition","speech-representation"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ddlBoJack.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-11-25T06:59:24.000Z","updated_at":"2025-04-14T09:12:42.000Z","dependencies_parsed_at":"2024-06-24T10:57:33.779Z","dependency_job_id":"9ee87728-c179-41a2-a0dc-6bfe604b7b1a","html_url":"https://github.com/ddlBoJack/emotion2vec","commit_stats":{"total_commits":41,"total_committers":4,"mean_commits":10.25,"dds":0.09756097560975607,"last_synced_commit":"85cce81dc08fadd45aaea79f25c25a20d469f063"},"previous_names":["ddlbojack/emotion2vec"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ddlBoJack%2Femotion2vec","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ddlBoJack%2Femotion2vec/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ddlBoJack%2Femotion2vec/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ddlBoJack%2Femotion2vec/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ddlBoJack","download_url":"https://codeload.github.com/ddlBoJack/emotion2vec/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254292042,"owners_count":22046426,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["iemocap","pytorch-implementation","speech-emotion-recognition","speech-representation"],"created_at":"2024-11-21T15:06:28.168Z","updated_at":"2025-05-15T07:07:12.275Z","avatar_url":"https://github.com/ddlBoJack.png","language":"Python","readme":"\u003cdiv align=\"center\"\u003e\n    \u003ch1\u003e\n    EMOTION2VEC\n    \u003c/h1\u003e\n    \u003cp\u003e\n    Official PyTorch code for extracting features and training downstream models with \u003cbr\u003e\n    \u003cb\u003e\u003cem\u003eemotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation\u003c/em\u003e\u003c/b\u003e\n    \u003c/p\u003e\n    \u003cp\u003e\n    \u003cimg src=\"src/logo.png\" alt=\"emotion2vec Logo\" style=\"width: 200px; height: 200px;\"\u003e\n    \u003c/p\u003e\n    \u003cp\u003e\n    \u003c/p\u003e\n    \u003ca href=\"https://github.com/ddlBoJack/emotion2vec\"\u003e\u003cimg src=\"https://img.shields.io/badge/Platform-linux-lightgrey\" alt=\"version\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/ddlBoJack/emotion2vec\"\u003e\u003cimg src=\"https://img.shields.io/badge/Python-3.8+-orange\" alt=\"version\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/ddlBoJack/emotion2vec\"\u003e\u003cimg src=\"https://img.shields.io/badge/PyTorch-1.13+-brightgreen\" alt=\"python\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://github.com/ddlBoJack/emotion2vec\"\u003e\u003cimg src=\"https://img.shields.io/badge/License-MIT-red.svg\" alt=\"mit\"\u003e\u003c/a\u003e\n\u003c/div\u003e\n\n# News\n- [Oct. 2024] 🔧 We update the usage in the FunASR interface with source selection. \"ms\" or \"modelscope\" for China mainland users; \"hf\" or \"huggingface\" for other overseas users. **We recommend using FunASR interface for a smooth landing.**\n- [Jun. 2024] 🔧 We fix a bug in emotion2vec+. Please re-pull the latest code. \n- [May. 2024] 🔥 Speech emotion recognition foundation model: **emotion2vec+**, with 9-class emotions has been released on [Model Scope](https://modelscope.cn/models/iic/emotion2vec_plus_large/summary) and [Hugging Face](https://huggingface.co/emotion2vec). Check out a series of emotion2vec+ (seed, base, large) models for SER with high performance **(We recommend this release instead of the Jan. 2024 release)**. \n- [Jan. 2024] 9-class emotion recognition model with iterative fine-tuning from emotion2vec has been released in [modelscope](https://www.modelscope.cn/models/iic/emotion2vec_base_finetuned/summary) and [FunASR](https://github.com/alibaba-damo-academy/FunASR/tree/main/examples/industrial_data_pretraining/emotion2vec). \n- [Jan. 2024] **emotion2vec** has been integrated into [modelscope](https://www.modelscope.cn/models/iic/emotion2vec_base/summary) and [FunASR](https://github.com/alibaba-damo-academy/FunASR/tree/main/examples/industrial_data_pretraining/emotion2vec).  \n- [Dec. 2023] We release the [paper](https://arxiv.org/abs/2312.15185), and create a [WeChat group](./src/Wechat.jpg) for emotion2vec. \n- [Nov. 2023] We release code, checkpoints, and extracted features for emotion2vec. \n\n# Model Card\nGitHub Repo: [emotion2vec](https://github.com/ddlBoJack/emotion2vec)\n|Model|⭐Model Scope|🤗Hugging Face|Fine-tuning Data (Hours)|\n|:---:|:-------------:|:-----------:|:-------------:|\n|emotion2vec|[Link](https://www.modelscope.cn/models/iic/emotion2vec_base/summary)|[Link](https://huggingface.co/emotion2vec/emotion2vec_base)|/|\n|emotion2vec+ seed|[Link](https://modelscope.cn/models/iic/emotion2vec_plus_seed/summary)|[Link](https://huggingface.co/emotion2vec/emotion2vec_plus_seed)|201|\n|emotion2vec+ base|[Link](https://modelscope.cn/models/iic/emotion2vec_plus_base/summary)|[Link](https://huggingface.co/emotion2vec/emotion2vec_plus_base)|4788|\n|emotion2vec+ large|[Link](https://modelscope.cn/models/iic/emotion2vec_plus_large/summary)|[Link](https://huggingface.co/emotion2vec/emotion2vec_plus_large)|42526|\n\n# Overview\n\n- [emotion2vec+: speech emotion recognition foundation model](#emotion2vec-speech-emotion-recognition-foundation-model)\n  - [Guides](#guides)\n  - [Data Engineering](#data-engineering)\n  - [Performance](#performance)\n  - [Inference with checkpoints](#inference-with-checkpoints)\n    - [Install from FunASR](#install-from-funasr)\n- [emotion2vec: universal speech emotion representation model](#emotion2vec-universal-speech-emotion-representation-model)\n  - [Guides](#guides-1)\n  - [Performance](#performance-1)\n    - [Performance on IEMOCAP](#performance-on-iemocap)\n    - [Performance on other languages](#performance-on-other-languages)\n    - [Performance on other speech emotion tasks](#performance-on-other-speech-emotion-tasks)\n  - [Visualization](#visualization)\n  - [Extract features](#extract-features)\n    - [Download extracted features](#download-extracted-features)\n    - [Extract features from your dataset](#extract-features-from-your-dataset)\n      - [Install from the source code](#install-from-the-source-code)\n      - [Install from FunASR](#install-from-funasr-1)\n  - [Training your downstream model](#training-your-downstream-model)\n  - [Contributors](#contributors)\n  - [Citation](#citation)\n\n# emotion2vec+: speech emotion recognition foundation model\n\n## Guides\nemotion2vec+ is a series of foundational models for speech emotion recognition (SER). We aim to train a \"whisper\" in the field of speech emotion recognition, overcoming the effects of language and recording environments through data-driven methods to achieve universal, robust emotion recognition capabilities. The performance of emotion2vec+ significantly exceeds other highly downloaded open-source models on Hugging Face.\n\n![](./src/emotion2vec+radar.png)\n\n## Data Engineering\nWe offer 3 versions of emotion2vec+, each derived from the data of its predecessor. If you need a model focusing on spech emotion representation, refer to [emotion2vec: universal speech emotion representation model](#emotion2vec-universal-speech-emotion-representation-model).\n\n- emotion2vec+ seed: Fine-tuned with academic speech emotion data from [EmoBox](https://github.com/emo-box/EmoBox)\n- emotion2vec+ base: Fine-tuned with filtered large-scale pseudo-labeled data to obtain the base size model (~90M)\n- emotion2vec+ large: Fine-tuned with filtered large-scale pseudo-labeled data to obtain the large size model (~300M)\n\nThe iteration process is illustrated below, culminating in the training of the emotion2vec+ large model with 40k out of 160k hours of speech emotion data. Details of data engineering will be announced later. \n\n## Performance\n\nPerformance on [EmoBox](https://github.com/emo-box/EmoBox) for 4-class primary emotions (without fine-tuning). Details of model performance will be announced later. \n\n![](./src/emotion2vec+performance.png)\n\n## Inference with checkpoints\n\n### Install from FunASR\n1. install funasr\n```bash\npip install -U funasr\n```\n\n2. run the code.\n```python\n'''\nUsing the finetuned emotion recognization model\n\nrec_result contains {'feats', 'labels', 'scores'}\n\textract_embedding=False: 9-class emotions with scores\n\textract_embedding=True: 9-class emotions with scores, along with features\n\n9-class emotions: \niic/emotion2vec_plus_seed, iic/emotion2vec_plus_base, iic/emotion2vec_plus_large (May. 2024 release)\niic/emotion2vec_base_finetuned (Jan. 2024 release)\n    0: angry\n    1: disgusted\n    2: fearful\n    3: happy\n    4: neutral\n    5: other\n    6: sad\n    7: surprised\n    8: unknown\n'''\n\nfrom funasr import AutoModel\n\n# model=\"iic/emotion2vec_base\"\n# model=\"iic/emotion2vec_base_finetuned\"\n# model=\"iic/emotion2vec_plus_seed\"\n# model=\"iic/emotion2vec_plus_base\"\nmodel_id = \"iic/emotion2vec_plus_large\"\n\nmodel = AutoModel(\n    model=model_id,\n    hub=\"ms\",  # \"ms\" or \"modelscope\" for China mainland users; \"hf\" or \"huggingface\" for other overseas users\n)\n\nwav_file = f\"{model.model_path}/example/test.wav\"\nrec_result = model.generate(wav_file, output_dir=\"./outputs\", granularity=\"utterance\", extract_embedding=False)\nprint(rec_result)\n```\nThe model will be downloaded automatically.\n\nFunASR support file list input in wav.scp (kaldi style):\n```\nwav_name1 wav_path1.wav\nwav_name2 wav_path2.wav\n...\n```\nRefer to [FunASR](https://github.com/alibaba-damo-academy/FunASR/tree/main/examples/industrial_data_pretraining/emotion2vec) for more details.\n\n\n# emotion2vec: universal speech emotion representation model\n\n## Guides\n\nemotion2vec is the first universal speech emotion representation model. Through self-supervised pre-training, emotion2vec has the ability to extract emotion representation across different tasks, languages, and scenarios.\n\n## Performance\n### Performance on IEMOCAP\nemotion2vec achieves SOTA with only linear layers on the mainstream IEMOCAP dataset. Refer to the paper for more details.\n![](./src/IEMOCAP.png)\n\n### Performance on other languages\nemotion2vec achieves SOTA compared with SOTA SSL models on multiple languages (Mandarin, French, German, Italian, etc.). Refer to the paper for more details.\n![](./src/Languages.png)\n\n### Performance on other speech emotion tasks\nRefer to the paper for more details.\n\n## Visualization\nUMAP visualizations of learned features on the IEMOCAP dataset. \u003cspan style=\"color:red;\"\u003eRed\u003c/span\u003e and \u003cspan style=\"color:blue;\"\u003eBlue\u003c/span\u003e tones mean low and high arousal emotional classes, respectively.  Refer to the paper for more details. \n![](./src/UMAP.png)\n\n## Extract features\n### Download extracted features\nWe provide the extracted features of popular emotion dataset IEMOCAP. The features are extracted from the last layer of emotion2vec. The features are stored in `.npy` format and the sample rate of the extracted features is 50Hz. The utterance-level features are computed by averaging the frame-level features.\n- frame-level: [Google Drive](https://drive.google.com/file/d/1JdQzwDJJEdKZcqSC1TXETvFZ7VpUvLEX/view?usp=sharing) | [Baidu Netdisk](https://pan.baidu.com/s/1FtCwhUwhONaeEos4nLYFWw?pwd=zb3p) (password: zb3p)\n- utterance-level: [Google Drive](https://drive.google.com/file/d/1jJVfoEKC8yjwj39F__8jIQayd5PBO0WD/view?usp=sharing) | [Baidu Netdisk](https://pan.baidu.com/s/1AsJHacD6a5h27YJiCSee4w?pwd=qu3u) (password: qu3u)\n\nAll wav files are extracted from the original dataset for diverse downstream tasks. If want to train with standard 5531 utterances for 4 emotions classification, please refer to the `iemocap_downstream` folder.\n\n### Extract features from your dataset\n#### Install from the source code\nThe minimum environment requirements are `python\u003e=3.8` and `torch\u003e=1.13`. Our testing environments are `python=3.8` and `torch=2.01`.\n1. git clone repos.\n```bash\npip install fairseq\ngit clone https://github.com/ddlBoJack/emotion2vec.git\n```\n\n2. download emotion2vec checkpoint from:\n- [Google Drive](https://drive.google.com/file/d/10L4CEoEyt6mQrqdblDgDSfZETYvA9c2T/view?usp=sharing)\n- [Baidu Netdisk](https://pan.baidu.com/s/15zqmNTYa0mkEwlIom7DO3g?pwd=b9fq) (password: b9fq)\n- [modelscope](https://www.modelscope.cn/models/damo/emotion2vec_base/summary): `git clone https://www.modelscope.cn/damo/emotion2vec_base.git`\n\n3. modify and run `scripts/extract_features.sh`\n\n#### Install from FunASR\n1. install funasr\n```bash\npip install -U funasr\n```\n\n2. run the code.\n```python\n'''\nUsing the emotion representation model\nrec_result only contains {'feats'}\n\tgranularity=\"utterance\": {'feats': [*768]}\n\tgranularity=\"frame\": {feats: [T*768]}\n'''\n\nfrom funasr import AutoModel\n\nmodel_id = \"iic/emotion2vec_base\"\nmodel = AutoModel(\n    model=model_id,\n    hub=\"ms\",  # \"ms\" or \"modelscope\" for China mainland users; \"hf\" or \"huggingface\" for other overseas users\n)\n\nwav_file = f\"{model.model_path}/example/test.wav\"\nrec_result = model.generate(wav_file, output_dir=\"./outputs\", granularity=\"utterance\")\nprint(rec_result)\n```\nThe model will be downloaded automatically.\n\nFunASR support file list input in wav.scp (kaldi style):\n```\nwav_name1 wav_path1.wav\nwav_name2 wav_path2.wav\n...\n```\nRefer to [FunASR](https://github.com/alibaba-damo-academy/FunASR/tree/main/examples/industrial_data_pretraining/emotion2vec) for more details.\n\n## Training your downstream model\nWe provide training scripts for IEMOCAP dataset in the `iemocap_downstream` folder. You can modify the scripts to train your downstream model on other datasets.\n\n## Contributors\n|  Institution | Contribution |\n|:------|:-----|\n| [Shanghai Jiao Tong University](https://www.seiee.sjtu.edu.cn/) | Researchers; Computing power; Data collection; |\n| [Fudan University](https://istbi.fudan.edu.cn/) | Researchers |\n| [The Chinese University of Hong Kong](https://www.cuhk.edu.hk/chinese/index.html) | Researchers |\n| [Alibaba Group](https://www.alibaba.com/) | Researchers; Computing power; Data host; Model host; |\n| [Peng Cheng Laboratory](https://data-starcloud.pcl.ac.cn/) | Researchers |\n\n## Citation\nIf you find our emotion2vec code and paper useful, please kindly cite:\n```\n@article{ma2023emotion2vec,\n  title={emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation},\n  author={Ma, Ziyang and Zheng, Zhisheng and Ye, Jiaxin and Li, Jinchao and Gao, Zhifu and Zhang, Shiliang and Chen, Xie},\n  journal={Proc. ACL 2024 Findings},\n  year={2024}\n}\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fddlbojack%2Femotion2vec","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fddlbojack%2Femotion2vec","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fddlbojack%2Femotion2vec/lists"}