https://github.com/cocoxili/cmpc

[IJCAI2022] Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast
https://github.com/cocoxili/cmpc

biometric-matching crossmodal-retrieval deep-learning multimodal-learning representation-learning voice-face-association voxceleb

Last synced: 6 months ago
JSON representation

[IJCAI2022] Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast

Host: GitHub
URL: https://github.com/cocoxili/cmpc
Owner: Cocoxili
License: mit
Created: 2022-04-27T14:45:31.000Z (about 4 years ago)
Default Branch: main
Last Pushed: 2023-10-25T03:21:28.000Z (over 2 years ago)
Last Synced: 2025-05-28T10:11:40.140Z (about 1 year ago)
Topics: biometric-matching, crossmodal-retrieval, deep-learning, multimodal-learning, representation-learning, voice-face-association, voxceleb
Language: Python
Homepage:
Size: 9.37 MB
Stars: 20
Watchers: 1
Forks: 3
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast

This is the PyTorch implementation for CMPC, as described in our paper:

**[Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast](https://arxiv.org/abs/2204.14057)**

```angular2html

@inproceedings{zhu2022unsupervised,

  title={Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast},

  author={Zhu, Boqing and Xu, Kele and Wang, Changjian and Qin, Zheng and Sun, Tao and Wang, Huaimin and Peng, Yuxing},

  booktitle={Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, {IJCAI-22}},

  pages={3787--3794},

  year={2022},

  month={7}

}

```

![Framework](img/fig_pipeline.png)

We also provide the [pretrained model](#unsupervised-training) and [testing resources](#testing-data).

### Requirments:

* torch==1.7.0+cu110

* matplotlib==3.4.3

* pykeops==1.5

* pandas==1.1.3

* librosa==0.6.2

* Pillow==9.0.1

* PyYAML==6.0

* scikit_learn==1.0.2

### Download Pre-trained Models

CID| CMPC

------ | ------

### Data Pre-processing

In order to speed up the iteration of training, we extract the logmel features of voice data through pre-processing.

```bash

>> cd experiments/cmpc

>> python data_transform.py --wav_dir {directory-of-the-wav-file} --logmel_dir {destination-path}

```

### Unsupervised Training

The configurations are written in the CONFIG.yaml file, which can be changed according to your needs, 

such as the path information. The unsupervised training process can begin as:

```bash

>> python train.py CONFIG.yaml

```

### Evalution on our trained model

Experiments on three evalution protocals: matching, verification and retrieval. The '--ckp_path' could be

the path of downloaded model or your trained model.

```bash

>> python matching.py CONFIG.yaml --ckp_path {checkpoint path}

>> python verification.py CONFIG.yaml --ckp_path {checkpoint path}

>> python retrieval.py CONFIG.yaml --ckp_path {checkpoint path}

```

### Testing data

[Matching](./data/matching), [verification](./data/veriflist) and [retrieval](./data/retrieval) testing data is released at [./data](./data) directory.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cocoxili/cmpc

Awesome Lists containing this project

README