https://github.com/wangyongjie-ntu/deep-multimodal-speaker-naming
https://github.com/wangyongjie-ntu/deep-multimodal-speaker-naming
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/wangyongjie-ntu/deep-multimodal-speaker-naming
- Owner: wangyongjie-ntu
- Created: 2018-02-08T01:28:58.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-02-08T03:34:26.000Z (over 7 years ago)
- Last Synced: 2025-01-09T20:08:47.963Z (4 months ago)
- Language: Python
- Size: 60 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Introduce
This repository reproduce the paper "deep multimodal speaker naming", published in ACM MM 15. Thanks for the author shared database with me. The author implemented it in Matlab(really nice work). Considering most of deep learning platform are avaiable and convenient to make it. I rewrote it in Python on Tensorflow Platform.
## Requirement
- Tensorflow 1.4.1
- Python 2.7
- CUDA(optional)
- Matlab
- OpenCV## Usage
### step 1
unzip matlab toolbox and addpath in matlab
```
unzip mirtoolbox17zip.zip -d $MATLAB/toolbox
```### step 2
run code in ./matlab and extract audio feature. Before you run them, you should modify the file location in those codes.
```
merge_audio_file_friends
gen_audio_data_friends
```### step 3
run net.py to train faca network. After training, run net-audio.py to fine-tune the pretrain model.
```
python net.py
python net-audio.py
```
Take notices of the friend-name.txt. This file save the order of face's name as same as matlab list. Make sure the audio feature and image attribute to the same person.### Results
The accuracy of Friends Series S05E05.
| | Accuracy in Paper | Accuracy in reproduction |
| :--------------: | :---------------: | :----------------------: |
| face model | 86.7% | 87.41% |
| face-audio model | 88.5% | 88.335% |With additional audio information, the accuracy improve by 1% in my reproduction code.
# Reference
If you find this code is helpful, please cite this paper.
```
@inproceedings{hu2015deep,
title={{Deep Multimodal Speaker Naming}},
author={Hu, Yongtao and Ren, Jimmy SJ. and Dai, Jingwen and Yuan, Chang and Xu, Li and Wang, Wenping},
booktitle={Proceedings of the 23rd Annual ACM International Conference on Multimedia},
pages={1107--1110},
year={2015},
organization={ACM}
}
```