{"id":19464376,"url":"https://github.com/aryaaftab/light-sernet","last_synced_at":"2025-04-25T09:31:27.655Z","repository":{"id":43909447,"uuid":"414323858","full_name":"AryaAftab/LIGHT-SERNET","owner":"AryaAftab","description":"Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition ","archived":false,"fork":false,"pushed_at":"2022-05-25T14:59:58.000Z","size":328,"stargazers_count":72,"open_issues_count":4,"forks_count":24,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-04-03T19:39:30.975Z","etag":null,"topics":["deep-learning","fully-convolutional-networks","lightweight","speech-emotion-recognition","tensorflow2","tflite"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AryaAftab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-10-06T18:16:38.000Z","updated_at":"2025-03-23T06:47:41.000Z","dependencies_parsed_at":"2022-08-28T20:50:48.884Z","dependency_job_id":null,"html_url":"https://github.com/AryaAftab/LIGHT-SERNET","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AryaAftab%2FLIGHT-SERNET","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AryaAftab%2FLIGHT-SERNET/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AryaAftab%2FLIGHT-SERNET/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AryaAftab%2FLIGHT-SERNET/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AryaAftab","download_url":"https://codeload.github.com/AryaAftab/LIGHT-SERNET/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250790064,"owners_count":21487740,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","fully-convolutional-networks","lightweight","speech-emotion-recognition","tensorflow2","tflite"],"created_at":"2024-11-10T18:14:45.922Z","updated_at":"2025-04-25T09:31:27.326Z","avatar_url":"https://github.com/AryaAftab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Light-SERNet\r\n\r\nThis is the Tensorflow 2.x implementation of our paper [\"Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition\"](https://arxiv.org/abs/2110.03435), accepted in ICASSP 2022. \r\n\r\n\u003cdiv align=center\u003e\r\n\u003cimg width=95% src=\"./pics/Architecture.png\"/\u003e\r\n\u003c/div\u003e\r\nIn this paper, we propose an efficient and lightweight fully convolutional neural network(FCNN) for speech emotion recognition in systems with limited hardware resources. In the proposed FCNN model, various feature maps are extracted via three parallel paths with different filter sizes. This helps deep convolution blocks to extract high-level features, while ensuring sufficient separability. The extracted features are used to classify the emotion of the input speech segment. While our model has a smaller size than that of the state-of-the-art models, it achieves a higher performance on the IEMOCAP and EMO-DB datasets.\r\n\r\n\r\n## Demo\r\nDemo on EMO-DB dataset: \r\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/AryaAftab/LIGHT-SERNET/blob/master/Demo_Light_SERNet.ipynb)\r\n\r\n\r\n## Run\r\n### 1. Clone Repository\r\n```bash\r\n$ git clone https://github.com/AryaAftab/LIGHT-SERNET.git\r\n$ cd LIGHT-SERNET/\r\n```\r\n### 2. Requirements\r\n- Tensorflow \u003e= 2.3.0\r\n- Numpy \u003e= 1.19.2\r\n- Tqdm \u003e= 4.50.2\r\n- Matplotlib\u003e = 3.3.1\r\n- Scikit-learn \u003e= 0.23.2\r\n\r\n```bash\r\n$ pip install -r requirements.txt\r\n```\r\n\r\n### 3. Data:\r\n* Download **[EMO-DB](http://emodb.bilderbar.info/download/download.zip)** and **[IEMOCAP](https://sail.usc.edu/iemocap/iemocap_release.htm)**(requires permission to access) datasets\r\n* extract them in [data](./data) folder\r\n\r\n**Note:** For using **IEMOCAP** dataset, please follow issue [#3](../../issues/3). \r\n\r\n### 4. Set hyperparameters and training config :\r\nYou only need to change the constants in the [hyperparameters.py](./hyperparameters.py) to set the hyperparameters and the training config.\r\n\r\n### 5. Strat training:\r\nUse the following code to train the model on the desired dataset, cost function, and input length(second).\r\n- Note 1: The input is automatically cut or padded to the desired size and stored in the [data](./data) folder.\r\n- Note 2: The best model are saved in the [model](./model) folder.\r\n- Note 3: The results for the confusion matrix are saved in the [result](./result) folder.\r\n```bash\r\n$ python train.py -dn {dataset_name} \\\r\n                  -id {input durations} \\\r\n                  -at {audio_type} \\\r\n                  -ln {cost function name} \\\r\n                  -v {verbose for training bar} \\\r\n                  -it {type of input(mfcc, spectrogram, mel_spectrogram)} \\\r\n                  -c {type of cache(disk, ram, None)} \\\r\n                  -m {fuse mfcc feature extractor in exported tflite model}\r\n```\r\n#### Example:\r\n\r\nEMO-DB Dataset:\r\n```bash\r\npython train.py -dn \"EMO-DB\" \\\r\n                -id 3 \\\r\n                -at \"all\" \\\r\n                -ln \"focal\" \\\r\n                -v 1 \\\r\n                -it \"mfcc\" \\\r\n                -c \"disk\" \\\r\n                -m false\r\n```\r\n\r\nIEMOCAP Dataset:\r\n```bash\r\npython train.py -dn \"IEMOCAP\" \\\r\n                -id 7 \\\r\n                -at \"impro\" \\\r\n                -ln \"cross_entropy\" \\\r\n                -v 1 \\\r\n                -it \"mfcc\" \\\r\n                -c \"disk\" \\\r\n                -m false\r\n```\r\n**Note : For all experiments just run ```run.sh```**\r\n```bash\r\nsh run.sh\r\n```\r\n\r\n## Fusing MFCC Extractor(New Feature)\r\nTo run the model independently and without the need for the Tensorflow library, the MFCC feature extractor was added as a single layer to the beginning of the model. Then, The trained model was exported as a single file in the TensorFlow Lite format. The input of this model is raw sound in the form of a vector ```(1, sample_rate * input_duration)```.\r\nTo train with fusing feature:\r\n```bash\r\npython train.py -dn \"EMO-DB\" \\\r\n                -id 3 \\\r\n                -m True\r\n```\r\n- Note 1: The best model are saved in the [model](./model) folder.\r\n- Note 2: To run tflite model you can just use ```tflite_runtime``` library. For using ```tflite_runtime``` library in this project, you need to build it with **TF OP support(Flex delegate)**. you can learn how to built Tenorflow Lite from source with this flag **[here](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/pip_package)**.\r\n- Note 3: To run tflite model as a real-time application another **[repository](https://github.com/AryaAftab/RealTime-LIGHT-SERNET)** will be completed soon.\r\n\r\n## Citation\r\n\r\nIf you find our code useful for your research, please consider citing:\r\n```bibtex\r\n@inproceedings{aftab2022light,\r\n  title={Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition},\r\n  author={Aftab, Arya and Morsali, Alireza and Ghaemmaghami, Shahrokh and Champagne, Benoit},\r\n  booktitle={ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},\r\n  pages={6912--6916},\r\n  year={2022},\r\n  organization={IEEE}\r\n}\r\n```\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faryaaftab%2Flight-sernet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faryaaftab%2Flight-sernet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faryaaftab%2Flight-sernet/lists"}