Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/cyrta/voxceleb

mirror of VoxCeleb dataset - a large-scale speaker identification dataset
https://github.com/cyrta/voxceleb

corpus dataset speaker speaker-identification speaker-recognition speaker-verification speech

Last synced: 2 months ago
JSON representation

mirror of VoxCeleb dataset - a large-scale speaker identification dataset

Host: GitHub
URL: https://github.com/cyrta/voxceleb
Owner: cyrta
License: other
Created: 2018-01-16T12:42:58.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2019-07-05T01:52:14.000Z (almost 5 years ago)
Last Synced: 2024-01-28T12:39:04.024Z (5 months ago)
Topics: corpus, dataset, speaker, speaker-identification, speaker-recognition, speaker-verification, speech
Language: Shell
Size: 10 MB
Stars: 62
Watchers: 4
Forks: 17
Open Issues: 5
Metadata Files:
- Readme: README.rst
- License: LICENSE.txt

Lists

awesome-speaker-recognition-verification - VoxCeleb mirror

README

        # VoxCeleb

mirror of [VoxCeleb dataset - a large-scale speaker identification dataset](http://www.robots.ox.ac.uk/~vgg/data/voxceleb/)

THIS IS WORK IN PROGRESS.

I would like to have a reproducable way do download mp3 from youtube, trim it and store as delivered by the author of the dataset

----

This repo contains the download links to the VoxCeleb dataset, described in [1]. 

VoxCeleb contains over 100,000 utterances for 1,251 celebrities, extracted from videos uploaded to YouTube. The dataset is gender balanced, with 55% of the speakers male. The speakers span a wide range of different ethnicities, accents, professions and ages. There are no overlapping identities between development and test sets.

+-------------------+---------+-------+

|                   | train   | test  |

+===================+=========+=======+

| # of speakers     | 1,211   | 40    |

+-------------------+---------+-------+

| # of videos       | 21,819  | 677   |

+-------------------+---------+-------+

| # of utterances   | 139,124 | 6,255 |

+-------------------+---------+-------+

Nationality Distribution: The nationalities of the speakers in the dataset were obtained by crawling Wikipedia and can be found here. You can also view the distribution in the following graph:

.. image:: ./data/v1/country.png

The list of duplicates (34 videos only in the train set) can be found [here](./data/v1/duplicates.txt).

The train/val/test split used in [1] below for Speaker Identification can be found [here](./data/v1/Identification_split.txt).

Models: 

 - Pretrained models from dataset authors for VGGVox - Speaker Identification and Verification [1] can be found [here](https://github.com/a-nagrani/VGGVox).

Notice:

> We are preparing an extended dataset (VoxCeleb2), containing up to 4 times as many speakers and videos.    

  VoxCeleb2 was originally due to be released in Q4 2017, however it has been delayed to Q1 2018 due to resource constraints.

-------

Publications:

[1] A. Nagrani, J. S. Chung, A. Zisserman - [VoxCeleb: a large-scale speaker identification dataset](./docs/2017-Nagrani-VoxCeleb_large-scale_speaker_identification_dataset.pdf) - INTERSPEECH, 2017

[2] Yifan He, Zhang Zhang - [Speaker Identication with VoxCeleb DataSet](./docs/2017-YifajHeZhangZhang-Speaker_Identication_with_VoxCeleb_DataSet-stanford_students_raport.pdf) - Stanford students project, 2017