{"id":28575060,"url":"https://github.com/audeering/w2v2-how-to","last_synced_at":"2025-06-10T22:14:08.399Z","repository":{"id":42002825,"uuid":"461904687","full_name":"audeering/w2v2-how-to","owner":"audeering","description":"How to use our public wav2vec2 dimensional emotion model","archived":false,"fork":false,"pushed_at":"2023-05-22T13:00:31.000Z","size":101,"stargazers_count":398,"open_issues_count":10,"forks_count":47,"subscribers_count":9,"default_branch":"main","last_synced_at":"2024-04-18T03:25:58.839Z","etag":null,"topics":["arousal","deep-learning","dominance","msp-podcast","onnx","speech-emotion-recognition","transformer-models","valence","wav2vec2"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/audeering.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-02-21T14:48:32.000Z","updated_at":"2024-04-18T02:20:19.000Z","dependencies_parsed_at":"2023-02-02T05:16:32.201Z","dependency_job_id":null,"html_url":"https://github.com/audeering/w2v2-how-to","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/audeering%2Fw2v2-how-to","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/audeering%2Fw2v2-how-to/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/audeering%2Fw2v2-how-to/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/audeering%2Fw2v2-how-to/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/audeering","download_url":"https://codeload.github.com/audeering/w2v2-how-to/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/audeering%2Fw2v2-how-to/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259160062,"owners_count":22814513,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arousal","deep-learning","dominance","msp-podcast","onnx","speech-emotion-recognition","transformer-models","valence","wav2vec2"],"created_at":"2025-06-10T22:13:55.877Z","updated_at":"2025-06-10T22:14:08.387Z","avatar_url":"https://github.com/audeering.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# How to use our public dimensional emotion model\n\nAn introduction to our model for \ndimensional speech emotion recognition based on\n[wav2vec 2.0](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/).\nThe model is available from \n[doi:10.5281/zenodo.6221127](https://doi.org/10.5281/zenodo.6221127)\nand released under\n[CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/).\nThe model was created\nby fine-tuning the pre-trained\n[wav2vec2-large-robust](https://huggingface.co/facebook/wav2vec2-large-robust)\nmodel on\n[MSP-Podcast](https://ecs.utdallas.edu/research/researchlabs/msp-lab/MSP-Podcast.html)\n(v1.7).\nThe pre-trained model was pruned\nfrom 24 to 12 transformer layers\nbefore fine-tuning.\nIn this tutorial we use the\n[ONNX](https://onnx.ai/)\nexport of the model.\nThe original \n[Torch](https://pytorch.org/)\nmodel is hosted at\n[Hugging Face](https://huggingface.co/audeering/wav2vec2-large-robust-12-ft-emotion-msp-dim).\nFurther details are given in the associated \n[paper](https://arxiv.org/abs/2203.07378).\n\n## License\n\nThe model can be used for non-commercial purposes,\nsee [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/).\nFor commercial usage,\na license for\n[devAIce](https://www.audeering.com/devaice/)\nmust be obtained.\nThe source code in this GitHub repository \nis released under the following\n[license](./LICENSE).\n\n## Quick start\n\nCreate / activate Python virtual environment and install \n[audonnx](https://github.com/audeering/audonnx).\n\n```\n$ pip install audonnx\n```\n\nLoad model and test on random signal.\n\n```python\nimport audeer\nimport audonnx\nimport numpy as np\n\n\nurl = 'https://zenodo.org/record/6221127/files/w2v2-L-robust-12.6bc4a7fd-1.1.0.zip'\ncache_root = audeer.mkdir('cache')\nmodel_root = audeer.mkdir('model')\n\narchive_path = audeer.download_url(url, cache_root, verbose=True)\naudeer.extract_archive(archive_path, model_root)\nmodel = audonnx.load(model_root)\n\nsampling_rate = 16000\nsignal = np.random.normal(size=sampling_rate).astype(np.float32)\nmodel(signal, sampling_rate)\n```\n```\n{'hidden_states': array([[-0.00711814,  0.00615957, -0.00820673, ...,  0.00666412,\n          0.00952989,  0.00269193]], dtype=float32),\n 'logits': array([[0.6717072 , 0.6421313 , 0.49881312]], dtype=float32)}\n```\n\nThe hidden states might be used as embeddings\nfor related speech emotion recognition tasks.\nThe order in the logits output is:\narousal,\ndominance,\nvalence.\n\n## Tutorial\n\nFor a detailed introduction, please check out the [notebook](./notebook.ipynb).\n\n```bash\n$ pip install -r requirements.txt\n$ jupyter notebook notebook.ipynb \n```\n\n## Citation\n\nIf you use our model in your own work, please cite the following\n[paper](https://arxiv.org/abs/2203.07378):\n\n```bibtex\n@article{wagner2023dawn,\n    title={Dawn of the Transformer Era in Speech Emotion Recognition: Closing the Valence Gap},\n    author={Wagner, Johannes and Triantafyllopoulos, Andreas and Wierstorf, Hagen and Schmitt, Maximilian and Burkhardt, Felix and Eyben, Florian and Schuller, Bj{\\\"o}rn W},\n    journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},\n    pages={1--13},\n    year={2023},\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faudeering%2Fw2v2-how-to","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faudeering%2Fw2v2-how-to","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faudeering%2Fw2v2-how-to/lists"}