{"id":13785563,"url":"https://github.com/seongmin-kye/meta-SR","last_synced_at":"2025-05-11T21:31:18.814Z","repository":{"id":216015492,"uuid":"284422976","full_name":"seongmin-kye/meta-SR","owner":"seongmin-kye","description":"Pytorch implementation of Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs (Interspeech, 2020)","archived":false,"fork":false,"pushed_at":"2020-09-16T01:00:11.000Z","size":797,"stargazers_count":72,"open_issues_count":4,"forks_count":19,"subscribers_count":8,"default_branch":"master","last_synced_at":"2024-08-03T19:09:32.572Z","etag":null,"topics":["meta-learning","short-utterances","speaker-recognition","speaker-verification"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/seongmin-kye.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2020-08-02T08:35:47.000Z","updated_at":"2024-06-13T01:49:15.000Z","dependencies_parsed_at":"2024-01-08T01:46:10.813Z","dependency_job_id":"328c5a1b-8373-4c0e-b5d3-06e8da31e281","html_url":"https://github.com/seongmin-kye/meta-SR","commit_stats":null,"previous_names":["seongmin-kye/meta-sr"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seongmin-kye%2Fmeta-SR","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seongmin-kye%2Fmeta-SR/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seongmin-kye%2Fmeta-SR/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seongmin-kye%2Fmeta-SR/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/seongmin-kye","download_url":"https://codeload.github.com/seongmin-kye/meta-SR/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225101043,"owners_count":17421055,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["meta-learning","short-utterances","speaker-recognition","speaker-verification"],"created_at":"2024-08-03T19:01:01.844Z","updated_at":"2024-11-17T22:30:23.649Z","avatar_url":"https://github.com/seongmin-kye.png","language":"Python","funding_links":[],"categories":["Table of Contents"],"sub_categories":["Pretrained models/embeddings"],"readme":"# Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs\nPytorch code for following paper:\n* **Title** : Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs. [[paper](https://arxiv.org/abs/2004.02863)]\n* **Author** : Seong Min Kye, [Youngmoon Jung](https://github.com/jymsuper), [Hae Beom Lee](https://haebeom-lee.github.io/), [Sung Ju Hwang](http://www.sungjuhwang.com), Hoirin Kim \n* **Conference** : Interspeech, 2020.\n\n### Abstract\n\u003cimg align=\"middle\" width=\"1000\" src=\"https://github.com/seongmin-kye/meta-SR/blob/master/overview.png\"\u003e\n\nIn practical settings, a speaker recognition system needs to identify a speaker given a short utterance, while the enrollment utterance may be relatively long. However, existing speaker recognition models perform poorly with such short utterances. To solve this problem, we introduce a meta-learning framework for imbalance length pairs. Specifically, we use a Prototypical Networks and train it with a support set of long utterances and a query set of short utterances of varying lengths. Further, since optimizing only for the classes in the given episode may be insufficient for learning disminative embeddings for unseen classes, we additionally enforce the model to classify both the support and the query set against the entire set of classes in the training set. By combining these two learning schemes, our model outperforms existing state-of-the-art speaker verification models learned with a standard supervised learning framework on short utterance (1-2 seconds) on the VoxCeleb datasets. We also validate our proposed model for unseen speaker identification, on which it also achieves significant performance gains over the existing approaches.\n\n### Requirements\n* Python 3.6\n* Pytorch 1.3.1\n\n### Data preparation\n\nThe following script can be used to download and prepare the VoxCeleb dataset for training. This preparation code is based on [**VoxCeleb_trainer**](https://github.com/clovaai/voxceleb_trainer), but slightly changed.\n\n```\npython dataprep.py --save_path /root/home/voxceleb --download --user USERNAME --password PASSWORD \npython dataprep.py --save_path /root/home/voxceleb --extract\npython dataprep.py --save_path /root/home/voxceleb --convert\n```\n\nIn addition to the Python dependencies, `wget` and `ffmpeg` must be installed on the system.\n\n### Feature extraction\n\nIn configure.py, specify the path to the directory. For example, in `meta-SR/configure.py` line 2:\n```\nsave_path = '/root/home/voxceleb'\n```\nThen, extract acoustic feature (mel filterbank-40).\n```\npython feat_extract/feature_extraction.py\n```\n\n### Training examples\n- Softmax:\n```\npython train.py --loss_type softmax --use_GC False --n_shot 1 --n_query 0 --use_variable False --nb_class_train 256\n```\n- Prototypical without global classification:\n```\npython train.py --loss_type prototypical --use_GC False --n_shot 1 --n_query 2 --use_variable True --nb_class_train 100\n```\n- Prototypical with global classification:\n```\npython train.py --loss_type prototypical --use_GC True --n_shot 1 --n_query 2 --use_variable True --nb_class_train 100\n```\nif you want to use fixed length query, set `--use_variable False`.\n\n### Evaluation\nIf you use __n-th__ folder \u0026 __k-th__ checkpoint\n- Speaker verification for full utterance:\n```\npython EER_full.py --n_folder n --cp_num k --data_type vox2\n```\nif you trained the model with VoxCeleb1, set `--data_type vox1`.\n\n- Speaker verification for short utterance:\n```\npython EER_short.py --n_folder n --cp_num k --test_length 100\n```\nex) test on 2-second utterance, set `--test_length 200`.\n\n- Unseen speaker identification:\n```\npython identification.py --n_folder n --cp_num k --nb_class_test 100 --test_length 100\n```\n\n### Pretrained model\nA pretrained model can be downloaded from [here](https://drive.google.com/file/d/1uqRviTrmm578nw_OQgqtj3iAmc6eSnTI/view?usp=sharing). \nIf you put this model into `meta-SR/saved_model/baseline_000`, and run following script, you can get `EER 2.08`.\n```\npython EER_full.py --n_folder 0 --cp_num 100 --data_type vox2\n```\n\n### Citation\nPlease cite the following if you make use of the code.\n```\n@inproceedings{kye2020meta,\n  title={Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs},\n  author={Kye, Seong Min and Jung, Youngmoon and Lee, Hae Beom and Hwang, Sung Ju and Kim, Hoirin},\n  booktitle={Interspeech},\n  year={2020}\n}\n```\n\n### Acknowledgments\nThis code is based on the implementation of [**SR_tutorial**](https://github.com/jymsuper/SpeakerRecognition_tutorial) and [**VoxCeleb_trainer**](https://github.com/clovaai/voxceleb_trainer). I would like to thank Youngmoon Jung, Joon Son Chung and Sung Ju Hwang for helpful discussions.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fseongmin-kye%2Fmeta-SR","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fseongmin-kye%2Fmeta-SR","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fseongmin-kye%2Fmeta-SR/lists"}