Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/slinusc/speaker_identification_evaluation
Evaluating the Effectiveness of Transformer Layers in Wav2Vec 2.0, XLS-R, and Whisper for Speaker Identification Tasks
https://github.com/slinusc/speaker_identification_evaluation
wav2vec2 whisper xls-r
Last synced: 2 months ago
JSON representation
Evaluating the Effectiveness of Transformer Layers in Wav2Vec 2.0, XLS-R, and Whisper for Speaker Identification Tasks
- Host: GitHub
- URL: https://github.com/slinusc/speaker_identification_evaluation
- Owner: slinusc
- Created: 2024-05-24T19:38:30.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-06-25T11:06:00.000Z (6 months ago)
- Last Synced: 2024-09-27T06:22:17.398Z (3 months ago)
- Topics: wav2vec2, whisper, xls-r
- Language: Jupyter Notebook
- Homepage:
- Size: 7.45 MB
- Stars: 1
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
### Abstract
This study evaluates the performance of three
advanced speech encoder models—Wav2Vec
2.0, XLS-R, and Whisper—in speaker identification tasks. By fine-tuning these models and
analyzing their layer-wise representations using
SVCCA, k-means clustering, and t-SNE visualizations, we found that Wav2Vec 2.0 and XLSR capture speaker-specific features effectively
in their early layers, with fine-tuning improving stability and performance. Whisper showed
better performance in deeper layers. Additionally, we determined the optimal number of transformer layers for each model when fine-tuned
for speaker identification tasks.