Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/slinusc/speaker_identification_evaluation

Evaluating the Effectiveness of Transformer Layers in Wav2Vec 2.0, XLS-R, and Whisper for Speaker Identification Tasks
https://github.com/slinusc/speaker_identification_evaluation

wav2vec2 whisper xls-r

Last synced: 2 months ago
JSON representation

Evaluating the Effectiveness of Transformer Layers in Wav2Vec 2.0, XLS-R, and Whisper for Speaker Identification Tasks

Awesome Lists containing this project

README

        

### Abstract

This study evaluates the performance of three
advanced speech encoder models—Wav2Vec
2.0, XLS-R, and Whisper—in speaker identification tasks. By fine-tuning these models and
analyzing their layer-wise representations using
SVCCA, k-means clustering, and t-SNE visualizations, we found that Wav2Vec 2.0 and XLSR capture speaker-specific features effectively
in their early layers, with fine-tuning improving stability and performance. Whisper showed
better performance in deeper layers. Additionally, we determined the optimal number of transformer layers for each model when fine-tuned
for speaker identification tasks.