Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/swshon/voxceleb-ivector
Voxceleb1 i-vector based speaker recognition system
https://github.com/swshon/voxceleb-ivector
i-vector kaldi speaker-embedding speaker-identification speaker-recognition speaker-verification voxceleb voxceleb1
Last synced: 26 days ago
JSON representation
Voxceleb1 i-vector based speaker recognition system
- Host: GitHub
- URL: https://github.com/swshon/voxceleb-ivector
- Owner: swshon
- License: mit
- Created: 2018-04-09T19:14:03.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2018-05-22T21:54:54.000Z (about 6 years ago)
- Last Synced: 2024-02-21T05:36:00.948Z (4 months ago)
- Topics: i-vector, kaldi, speaker-embedding, speaker-identification, speaker-recognition, speaker-verification, voxceleb, voxceleb1
- Language: Perl
- Size: 2.16 MB
- Stars: 41
- Watchers: 2
- Forks: 11
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Lists
- awesome-diarization - voxceleb-ivector - ivector?style=social) | i-vector |Perl | Voxceleb1 i-vector based speaker recognition system. | (Software / Speaker embedding)
- awesome-asv-antispoofing - voxceleb-ivector - ivector?style=social) | i-vector |Perl | Voxceleb1 i-vector based speaker recognition system. | (Software / Speaker embedding)
README
# Speaker Verification task in Voxceleb1 dataset
This repository contains simple scripts for a training i-vector speaker recognition system on Voxceleb1[1] dataset using Kaldi. It was modified based on run.sh file on Kaldi/egs/sre10.# Requirement
* Kaldi Toolkit# How to use
1. Move all files to {kaldi_root}/egs/sre10 folder
2. Modify dataset directories and parameters in run.sh file to fit in your machine.
3. Run run.sh file# Result
The 2048 component GMM-UBM and 600-dimensional i-vector extractor were trained using voxceleb1 training data for verification task. Training parameter is almost same compared to sre10 baseline on Kaldi egs.
GMM-2048 CDS eer : 15.39%
GMM-2048 LDA+CDS eer : 8.103%
GMM-2048 PLDA eer : 5.446%# Note
The Voxceleb1 dataset, a large-scale speaker identification dataset was published in 2017 with speaker embedding baseline[1] and reported i-vector shows 8.8% EER. The i-vector was extracted using 1024 component GMM-UBM, so the EER is fairly worse compared to the result above.# Reference
[1] A. Nagraniy, J. S. Chung, and A. Zisserman, “VoxCeleb: A large-scale speaker identification dataset,” in Interspeech, 2017, pp. 2616–2620.* CSV file in data folder created from here
(https://github.com/pyannote/pyannote-db-voxceleb/blob/master/scripts/prepare_data.ipynb)