Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/philipperemy/speaker-change-detection
Paper: https://arxiv.org/abs/1702.02285
https://github.com/philipperemy/speaker-change-detection
deep-learning keras speaker-change-detection
Last synced: about 2 months ago
JSON representation
Paper: https://arxiv.org/abs/1702.02285
- Host: GitHub
- URL: https://github.com/philipperemy/speaker-change-detection
- Owner: philipperemy
- Created: 2018-03-25T02:57:02.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2018-12-19T07:34:36.000Z (about 6 years ago)
- Last Synced: 2024-10-29T15:49:20.446Z (2 months ago)
- Topics: deep-learning, keras, speaker-change-detection
- Language: Python
- Size: 3.55 MB
- Stars: 62
- Watchers: 6
- Forks: 20
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Speaker Change Detection
Implementation of the paper: https://arxiv.org/abs/1702.02285
[![license](https://img.shields.io/badge/License-Apache_2.0-brightgreen.svg)](https://github.com/philipperemy/keras-attention-mechanism/blob/master/LICENSE) [![dep1](https://img.shields.io/badge/Tensorflow-1.6+-brightgreen.svg)](https://www.tensorflow.org/) [![dep2](https://img.shields.io/badge/Keras-2.0+-brightgreen.svg)](https://keras.io/)
_The mechanism proposed here is for real-time
speaker change detection in conversations, which firstly trains
a neural network text-independent speaker classifier using indomain
speaker data._
The accuracy is very high and close to 100%, as reported in the paper.
## Get Started
Because it takes a very long time to generate cache and inputs, I packaged them and uploaded them here:
- Cache uploaded at [cache-speaker-change-detection.zip](https://drive.google.com/open?id=1NRBBE7S1ecpbXQBfIyhY9O1DDNsBc0my) (unzip it in `/tmp/`)
- [speaker-change-detection-data.pkl](https://drive.google.com/open?id=12gMYaV-ymQOtkYHCf9HxPurb9vB6dADK) (place it in `/tmp/`)
- [speaker-change-detection-norm.pkl](https://drive.google.com/open?id=1vykyS3bxKbkuhGtk36eTWfW9ZkqwJi6e) (place it in `/tmp/`)You should have this:
- `/tmp/speaker-change-detection-data.pkl`
- `/tmp/speaker-change-detection-norm.pkl`
- `/tmp/speaker-change-detection/*.pkl`The final plots are generated as `/tmp/distance_test_ID.png` where ID is the id of the plot.
Be careful you have enough space in `/tmp/` because you might run out of disk space there. If it's the case, you can modify all the `/tmp/` references inside the codebase to any folder of your choice.
Now run those commands to reproduce the results.
```bash
git clone [email protected]:philipperemy/speaker-change-detection.git
cd speaker-change-detection
virtualenv -p python3.6 venv # probably will work on every python3 impl.
source venv/bin/activate
pip install -r requirements.txt
# download the cache and all the files specified above (you can re-generate them yourself if you wish).
cd ml/
export PYTHONPATH=..:$PYTHONPATH; python 1_generate_inputs.py
export PYTHONPATH=..:$PYTHONPATH; python 2_train_classifier.py
export PYTHONPATH=..:$PYTHONPATH; python 3_train_distance_classifier.py
```To regenerate only the VCTK cache, run:
```bash
cd audio/
export PYTHONPATH=..:$PYTHONPATH; python generate_all_cache.py
```## Contributions
Contributions are welcome! Some ways to improve this project:
- Given any audio file, is it possible to test it and detect any speaker change?## Questions
- Given any audio file, is it possible to test it and detect any speaker change?
Yes, as long as it follows the same structure as the VCTK Corpus dataset.- Is there any way to test the trained model to detect speaker changes of our audio files?
Yeah it's possible but it's going to be a bit difficult. I guess you have to choose a dataset and converts it to VCTK format.