Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ksasi/sapa

Last synced: 17 days ago
JSON representation

Host: GitHub
URL: https://github.com/ksasi/sapa
Owner: ksasi
License: gpl-3.0
Created: 2024-03-15T04:20:49.000Z (9 months ago)
Default Branch: main
Last Pushed: 2024-04-12T04:28:20.000Z (8 months ago)
Last Synced: 2024-10-16T12:48:48.641Z (2 months ago)
Language: Python
Size: 2.03 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # sapa

![Made With python 3.11.5](https://img.shields.io/badge/Made%20with-Python%203.11.5-brightgreen)![pytorch](https://img.shields.io/badge/Made%20with-pytorch-green.svg)![librosa](https://img.shields.io/badge/Made_with-librosa-blue)![speechbrain](https://img.shields.io/badge/Made_with-speechbrain-brown)![huggingface](https://img.shields.io/badge/Made_with-huggingface-violet)

### Code:

Below are the step to setup the code and perform training

### Setup:

After setting up the code as below, update the paths appropriately

> git clone https://github.com/ksasi/sapa.git

### Install Dependencies:

> cd sapa

> 

> pip install -r requirements.txt

## Speaker Verification

### Datasets :

- Create and change directory to ***dataset*** under ***Speaker_Verification***

- Download [VoxCeleb1-H (small subset)] (https://iitjacin-my.sharepoint.com/:u:/g/personal/d22cs051_iitj_ac_in/EVhTqG7PeDFBlkgHrG7WSJoB63ievtSFmE-PLdSxHtSNqA?e=Nlf8fX)

- Download [Kathbath dataset] (https://github.com/AI4Bharat/IndicSUPERB)

Dataset **Kathbath** structure after extraction :

```

Audio Data

data

├── telugu

│   ├── 

│   │   ├── 844483828886543-594-f.m4a

│   │   ├── 765429982765376-973-f.m4a

│   │   ├── ...

├── tamil

├── ...

Transcripts

data

├── telugu

│   ├── 

│   │   ├── transcription_n3w.txt

├── tamil

├── ...

```

Convert m4a to wav format as below :

> git clone https://github.com/AI4Bharat/IndicSUPERB.git

```

python utilities/structure.py \

  /kb_data_clean_m4a \

  /kb_data_clean_wav \

  

```

### Models Evaluation (VoxCeleb1-H (small subset)) :

Execute the below script to evaluate models (XLSR-Wav2Vec2, UniSpeech-SAT and WavLM-Base) with EER(%) using VoxCeleb1-H (small subset)

> cd Speaker_Verification

> 

> nohup python eval_voxceleb.py > \/log/eval_log_voxceleb.out &

> 

### Models Evaluation (Kathbath - Telugu) :

Execute the below script to evaluate models (XLSR-Wav2Vec2, UniSpeech-SAT and WavLM-Base) with EER(%) using test partition of Kathbath - Telugu dataset

> cd Speaker_Verification

> 

> nohup python eval_kathbath.py > \/log/eval_log_kathbath.out &

> 

### Fine-tune and evaluate WavLM model :

Execute the below script to fine-tune WavLM model on **valid partition** of Kathbath - Telugu dataset

> cd Speaker_Verification

> 

> nohup python train_WavLM.py > \/log/WavLM_log_finetune.out &

> 

Execute the below script to evaluate the fin-tuned WavLM mode on **test partition** of Kathbath - Telugu dataset

> cd Speaker_Verification

> 

> nohup python eval_WavLM.py > \/log/WavLM_log_ft_eval.out &

> 

## Source Separation

### Datasets :

- Setup LibriMix repo as below 

> git clone https://github.com/JorisCos/LibriMix.git

> 

> cd /LibriMix/metadata/Libri2Mix

Delete all folders , except **libri2mix_test-clean_info.csv** and **libri2mix_test-clean.csv**

- Execute the below script to generate LibriMix dataset

> cd /LibriMix

> 

> ./generate_librimix.sh storage_dir

where storage_dir = /dataset

### Model(SepFormer) Evaluation (LibriMix - LibriSpeech test clean partition) :

Execute the below script to perform evaluation of SepFormer on test split of (70-30 of LibriMix - LibriSpeech test clean partition)

> cd Source_Separation

> 

> nohup python eval_separator.py > \/log/eval_sepformer_librimix_batch_size8.out &

### Model(SepFormer) fine-tuning and Evaluation

Execute the below steps to fine-tune and evaluate SepFormer :

- Adopt the speechbrain [recipe](https://github.com/speechbrain/speechbrain/tree/develop/recipes/WSJ0Mix/separation) and fine-tune as below:

- Generate train and test csv files by executing `csv_generator.py` as below :

>

> cd Source_Separation

> 

> python csv_generator.py

> 

- Clone `speechbrain` repo and update `train.py` , `sepformer.yaml` as below:

>

> git clone https://github.com/speechbrain/speechbrain.git

> 

> cp \/Source_Separation/train.py \/Source_Separation/speechbrain/recipes/WSJ0Mix/separation/train.py

> 

> cp \/Source_Separation/sepformer.yaml \/Source_Separation/speechbrain/recipes/WSJ0Mix/separation/hparams/sepformer.yaml

> 

- Fine-tune sepformer with LibriMix dataset by running `train.py` as below:

>

> cd \/Source_Separation

> 

> nohup \/Source_Separation/speechbrain/recipes/WSJ0Mix/separation/train.py \/Source_Separation/speechbrain/recipes/WSJ0Mix/separation/hparams/sepformer.yaml > \/log/sepformer_ft.out &

### Demo (Speaker Verification) :

Demo of **Speaker Verification** from audio inputs can be executed by running `Speaker_Verification_Demo.ipynb` ipython notebook in the Demo folder

![Demo1](demo.png)

### References

- LibriMix  - [Github Link](https://github.com/JorisCos/LibriMix/)

- Speechbrain - [Github Link](https://github.com/speechbrain/speechbrain/tree/develop/recipes/WSJ0Mix/separation)

- EER Metric - [blog](https://yangcha.github.io/EER-ROC/)

- VoxCeleb dataset - [Link](https://mm.kaist.ac.kr/datasets/voxceleb/)

- Kathbath dataset - [Link](https://github.com/AI4Bharat/IndicSUPERB)

- UniSpeech - [Github Link](https://github.com/microsoft/UniSpeech/tree/main/downstreams/speaker_verification)

- SepFormer Huggingface - [Link](https://huggingface.co/speechbrain/sepformer-whamr)

- Torchmetrics - [Link](https://lightning.ai/docs/torchmetrics/stable/audio/scale_invariant_signal_noise_ratio.html)