https://github.com/cocoxili/cmpc
[IJCAI2022] Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast
https://github.com/cocoxili/cmpc
biometric-matching crossmodal-retrieval deep-learning multimodal-learning representation-learning voice-face-association voxceleb
Last synced: 5 months ago
JSON representation
[IJCAI2022] Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast
- Host: GitHub
- URL: https://github.com/cocoxili/cmpc
- Owner: Cocoxili
- License: mit
- Created: 2022-04-27T14:45:31.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2023-10-25T03:21:28.000Z (over 2 years ago)
- Last Synced: 2025-05-28T10:11:40.140Z (about 1 year ago)
- Topics: biometric-matching, crossmodal-retrieval, deep-learning, multimodal-learning, representation-learning, voice-face-association, voxceleb
- Language: Python
- Homepage:
- Size: 9.37 MB
- Stars: 20
- Watchers: 1
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast
This is the PyTorch implementation for CMPC, as described in our paper:
**[Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast](https://arxiv.org/abs/2204.14057)**
```angular2html
@inproceedings{zhu2022unsupervised,
title={Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast},
author={Zhu, Boqing and Xu, Kele and Wang, Changjian and Qin, Zheng and Sun, Tao and Wang, Huaimin and Peng, Yuxing},
booktitle={Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, {IJCAI-22}},
pages={3787--3794},
year={2022},
month={7}
}
```

We also provide the [pretrained model](#unsupervised-training) and [testing resources](#testing-data).
### Requirments:
* torch==1.7.0+cu110
* matplotlib==3.4.3
* pykeops==1.5
* pandas==1.1.3
* librosa==0.6.2
* Pillow==9.0.1
* PyYAML==6.0
* scikit_learn==1.0.2
### Download Pre-trained Models
### Data Pre-processing
In order to speed up the iteration of training, we extract the logmel features of voice data through pre-processing.
```bash
>> cd experiments/cmpc
>> python data_transform.py --wav_dir {directory-of-the-wav-file} --logmel_dir {destination-path}
```
### Unsupervised Training
The configurations are written in the CONFIG.yaml file, which can be changed according to your needs,
such as the path information. The unsupervised training process can begin as:
```bash
>> python train.py CONFIG.yaml
```
### Evalution on our trained model
Experiments on three evalution protocals: matching, verification and retrieval. The '--ckp_path' could be
the path of downloaded model or your trained model.
```bash
>> python matching.py CONFIG.yaml --ckp_path {checkpoint path}
>> python verification.py CONFIG.yaml --ckp_path {checkpoint path}
>> python retrieval.py CONFIG.yaml --ckp_path {checkpoint path}
```
### Testing data
[Matching](./data/matching), [verification](./data/veriflist) and [retrieval](./data/retrieval) testing data is released at [./data](./data) directory.