{"id":21953746,"url":"https://github.com/siavashshams/ssamba","last_synced_at":"2025-04-06T00:07:31.275Z","repository":{"id":239890621,"uuid":"800907360","full_name":"SiavashShams/ssamba","owner":"SiavashShams","description":"[SLT'24] The official implementation of SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model","archived":false,"fork":false,"pushed_at":"2024-10-18T17:33:02.000Z","size":1974,"stargazers_count":118,"open_issues_count":4,"forks_count":9,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-03-29T23:08:24.292Z","etag":null,"topics":["audio","audio-classification","deep-learning","emotion-recognition","keyword-spotting","mamba","representation-learning","self-supervised-learning","speaker-identification","state-space-model"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SiavashShams.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-15T08:22:19.000Z","updated_at":"2025-03-25T03:11:15.000Z","dependencies_parsed_at":"2024-05-15T23:28:30.670Z","dependency_job_id":"460331ee-2b43-4c12-a570-d51472870d3d","html_url":"https://github.com/SiavashShams/ssamba","commit_stats":null,"previous_names":["siavashshams/ssamba"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SiavashShams%2Fssamba","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SiavashShams%2Fssamba/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SiavashShams%2Fssamba/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SiavashShams%2Fssamba/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SiavashShams","download_url":"https://codeload.github.com/SiavashShams/ssamba/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247415968,"owners_count":20935388,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio","audio-classification","deep-learning","emotion-recognition","keyword-spotting","mamba","representation-learning","self-supervised-learning","speaker-identification","state-space-model"],"created_at":"2024-11-29T07:12:36.910Z","updated_at":"2025-04-06T00:07:31.257Z","avatar_url":"https://github.com/SiavashShams.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SSAMBA: Self-Supervised Audio Mamba\n\n[![arXiv](https://img.shields.io/badge/arXiv-2405.11831-b31b1b.svg)](https://arxiv.org/abs/2405.11831)\n[![Hugging Face](https://img.shields.io/badge/Hugging%20Face-Model-yellow?logo=huggingface\u0026logoColor=yellow)](https://huggingface.co/attentionisallyouneed369/ssamba)\n\n\u003cimg src=\"figures/amba.png\" alt=\"icon\" width=\"100\" height=\"100\"\u003e\n\n## News\n- **[2024-09-01]**: Paper accepted to IEEE Spoken Language Technology (SLT) Workshop 2024\n- **[2024-08-05]**: Looking for Contributors: Seeking help to implement a HuggingFace-compliant version of SSAMBA. Interested? Please reach out!\n- **[2024-07-16]**: Finetuning recipes for IEMOCAP, SCv1 and SCv2 datasets are added.\n- **[2024-07-01]**: Added a new task of dynamic audio scene labeling with 1 minute audio input from the Urban8k Sound dataset.\n- **[2024-05-20]**: Made our paper available on arXiv.\n\n\n## Introduction\nThis repository contains the official implementation (in PyTorch) of the the paper SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model. SSAMBA is an advanced audio representation learning model designed to leverage self-supervised learning techniques using the Mamba State Space Model. This project builds on the success of the Self-Supervised Audio Spectrogram Transformer (SSAST) and introduces novel methodologies to further enhance performance and efficiency on various audio tasks. \n\n## Installation\n\nTo install the necessary dependencies, you can use the following commands:\n\n```bash\ngit clone https://github.com/SiavashShams/ssamba.git\ncd ssamba\npip install -r requirements.txt\n```\nNext, clone the Vision Mamba repository inside your ssamba directory:\n```bash\ngit clone https://github.com/hustvl/Vim.git\n```\n\nIf you encounter issues with `bimamba_type`, please refer to the steps outlined in this [GitHub issue comment](https://github.com/hustvl/Vim/issues/14#issuecomment-1964685563).\n\n## Architecture\n\n![architecture](figures/ssamba.png)\n\n## Efficiency Comparison\nSSAMBA is approximately 92.7\\% faster in batch inference speed and 95.4\\% more memory-efficient than SSAST for the tiny model size with an input token size of 22k.\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"figures/inference_time_b4.png\" alt=\"Models Inference Speed\" width=\"45%\" /\u003e\n  \u003cimg src=\"figures/gpu_memory_b4.png\" alt=\"Models GPU Memory\" width=\"45%\" /\u003e\n\u003c/p\u003e\n\n## Pretraining\n\nWe pretrained SSAMBA with various sizes (base, small, tiny) for patches (250, 300, and 400) on a mixture of unlabeled audios from AudioSet and LibriSpeech. You can find these weights in the \"Pretrained Model Weights\" section below. However, if you want to pretrain the model from scratch, follow this recipe:\n\n1. **Navigate to the Directory**: Change to the directory containing the pretraining scripts. You can do this by running the following command in your terminal:\n    ```bash\n    cd ssamba/src/pretrain\n    ```\n\n2. **Adjust the Script**: Edit the `run_mask_patch_amba.sh` script to update the paths to your data files, Mamba encoder configurations, and any other necessary hyperparameters. Make sure that all paths and settings accurately reflect your local environment and the specifics of the dataset you are using.\n\n3. **Run the Script**: After making the necessary adjustments, execute the script to start the pretraining process. You can run the script directly from the terminal with the following command:\n    ```bash\n    ./run_mask_patch_amba.sh\n    ```\n\n## Pretrained Model Weights\n\nThe pretrained model weights for our SSAMBA model in sizes (base, small, and tiny) for different number of masked patches (400, 300, 250) can be found at:\n\n[Pretrained Model Weights](https://drive.google.com/drive/u/1/folders/1E1gf5SxdSByDJ16_WQvzTKn8lIoYtZiX)\n\n## Finetuning\n\n### Audioset_20k, ESC-50 and Speech Commands V2:\n\nTo finetune the pretrained SSAMBA on the balanced Audioset or ESC-50 datasets, follow these steps:\n\n1. **Navigate to the finetuning directory:**\n   - For Audioset:\n     ```bash\n     cd src/finetune/audioset\n     ```\n   - For ESC-50:\n     ```bash\n     cd src/finetune/esc50\n     ```\n   - For Speech Commands v2:\n     ```bash\n     cd src/finetune/speechcommands_v2\n     ```\n\n\n2. **Adjust the paths and hyperparameters:**\n   Edit `run_as_amba.sh`, `run_esc_patch_amba.sh` and `run_sc_amba.sh`. Adjust the paths and hyperparameters as needed for your dataset.\n\n3. **Configure SLURM job submission (if using SLURM):**\n   Add the models you want to finetune to `submit_jobs.sh`:\n   ```bash\n   #!/bin/bash\n\n   # Array of pre-trained models\n   declare -a models=(\"ssamba_tiny_400\")\n\n   # Submit a job for each model\n   for model in \"${models[@]}\"; do\n       sbatch run_as_amba.sh $model\n   done\n   ```\n\n4. **Run the job submission script:**\n   Execute the `submit_jobs.sh` script in the terminal to start the finetuning process:\n   ```bash\n   ./submit_jobs.sh\n   ```\n\nMake sure to monitor the jobs and adjust any parameters as needed to suit your specific requirements and hardware configuration.\n\n### VoxCeleb and IEMOCAP\n\n### Step 1: Install the SUPERB Package\n\n1. **Clone the SUPERB repository**:\n   ```bash\n   git clone https://github.com/s3prl/s3prl.git\n   ```\n\n2. **Navigate to the s3prl directory**:\n   ```bash\n   cd s3prl\n   ```\n\n3. **Install the package**:\n   ```bash\n   pip install -e ./\n   ```\n\n### Step 2: Prepare the Fine-Tuning Scripts\n\n1. **Copy our files**:\n   - Copy the files from `src/finetune/voxceleb1/ssast` to `s3prl/s3prl/upstream/ssast`.\n\n### Step 3: Adjust Paths and Specify Models\n\n1. **Edit the `run_sid.sh` or `run_er.sh` file**:\n   - Adjust the paths in the `run_sid.sh` or `run_er.sh` file to point to the correct directories for your dataset and model.\n\n2. **Specify models in `submit_jobs_amba.sh`**:\n   - Edit the `submit_jobs_amba.sh` script to specify the models you want to fine-tune.\n\n### Step 4: Run the Fine-Tuning Script\n\n1. **Execute the `submit_jobs_amba.sh` script**:\n   - In the terminal, navigate to the directory containing `submit_jobs_amba.sh` and run:\n     ```bash\n     ./submit_jobs_amba.sh\n     ```\n\n\n\n## License\nThe license for borrowed code can be found in [LICENSE](https://github.com/SiavashShams/ssamba/blob/main/LICENSE) file. \nWe acknowledge the wonderful work of [SSAST](https://arxiv.org/abs/2110.09784), and [Vision Mamba](https://arxiv.org/abs/2401.09417). \n\n## Citing\nIf you find this work helpful, please consider giving us a star 🌟 and citing:\n\n```bibtex\n@article{shams2024ssamba,\n      title={SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model},\n      author={Siavash Shams and Sukru Samet Dindar and Xilin Jiang and Nima Mesgarani},\n      year={2024},\n      eprint={2405.11831},\n      archivePrefix={arXiv},\n      primaryClass={eess.AS},\n      journal={arXiv preprint arXiv:2405.11831}\n}\n\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsiavashshams%2Fssamba","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsiavashshams%2Fssamba","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsiavashshams%2Fssamba/lists"}