https://github.com/markhershey/audiodeepfakedetection
SUTD 50.039 Deep Learning Course Project (2022 Spring)
https://github.com/markhershey/audiodeepfakedetection
audio audio-deepfake-detection deep-learning deepfake-detection
Last synced: 8 months ago
JSON representation
SUTD 50.039 Deep Learning Course Project (2022 Spring)
- Host: GitHub
- URL: https://github.com/markhershey/audiodeepfakedetection
- Owner: MarkHershey
- License: mit
- Created: 2022-03-29T15:26:23.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2023-11-23T11:41:09.000Z (over 2 years ago)
- Last Synced: 2024-05-02T06:25:03.669Z (about 2 years ago)
- Topics: audio, audio-deepfake-detection, deep-learning, deepfake-detection
- Language: Python
- Homepage: https://markhh.com/AudioDeepFakeDetection/
- Size: 197 MB
- Stars: 58
- Watchers: 3
- Forks: 17
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
README
# Audio Deep Fake Detection
A Course Project for SUTD 50.039 Theory and Practice of Deep Learning (2022 Spring)
Created by [Mark He Huang](https://markhh.com/), [Peiyuan Zhang](https://www.linkedin.com/in/lance-peiyuan-zhang-5b2886194/), [James Raphael Tiovalen](https://jamestiotio.github.io/), [Madhumitha Balaji](https://www.linkedin.com/in/madhu-balaji/), and [Shyam Sridhar](https://www.linkedin.com/in/shyam-sridhar/).
Check out our: [Project Report](Report.pdf) | [Interactive Website](https://markhh.com/AudioDeepFakeDetection/)
## Setup Environment
```bash
# Set up Python virtual environment
python3 -m venv venv && source venv/bin/activate
# Make sure your PIP is up to date
pip install -U pip wheel setuptools
# Install required dependencies
pip install -r requirements.txt
```
- Install PyTorch that suits your machine: [https://pytorch.org/get-started/locally/](https://pytorch.org/get-started/locally/)
## Setup Datasets
You may download the datasets used in the project from the following URLs:
- (Real) Human Voice Dataset: [LJ Speech (v1.1)](https://keithito.com/LJ-Speech-Dataset/)
- This dataset consists of 13,100 short audio clips of a single speaker reading passages from 7 non-fiction books.
- (Fake) Synthetic Voice Dataset: [WaveFake (v1.20)](https://zenodo.org/record/5642694)
- The dataset consists of 104,885 generated audio clips (16-bit PCM wav).
After downloading the datasets, you may extract them under `data/real` and `data/fake` respectively. In the end, the `data` directory should look like this:
```
data
├── real
│ └── wavs
└── fake
├── common_voices_prompts_from_conformer_fastspeech2_pwg_ljspeech
├── jsut_multi_band_melgan
├── jsut_parallel_wavegan
├── ljspeech_full_band_melgan
├── ljspeech_hifiGAN
├── ljspeech_melgan
├── ljspeech_melgan_large
├── ljspeech_multi_band_melgan
├── ljspeech_parallel_wavegan
└── ljspeech_waveglow
```
## Model Checkpoints
You may download the model checkpoints from here: [Google Drive](https://drive.google.com/drive/folders/1iR2zLQjBZgxIs3gHkXh54Ulg-M6-6W4L?usp=sharing). Unzip the files and replace the `saved` directory with the extracted files.
## Training
Use the [`train.py`](train.py) script to train the model.
```
usage: train.py [-h] [--real_dir REAL_DIR] [--fake_dir FAKE_DIR] [--batch_size BATCH_SIZE] [--epochs EPOCHS]
[--seed SEED] [--feature_classname {wave,lfcc,mfcc}]
[--model_classname {MLP,WaveRNN,WaveLSTM,SimpleLSTM,ShallowCNN,TSSD}]
[--in_distribution {True,False}] [--device DEVICE] [--deterministic] [--restore] [--eval_only] [--debug] [--debug_all]
optional arguments:
-h, --help show this help message and exit
--real_dir REAL_DIR, --real REAL_DIR
Directory containing real data. (default: 'data/real')
--fake_dir FAKE_DIR, --fake FAKE_DIR
Directory containing fake data. (default: 'data/fake')
--batch_size BATCH_SIZE
Batch size. (default: 256)
--epochs EPOCHS Number of maximum epochs to train. (default: 20)
--seed SEED Random seed. (default: 42)
--feature_classname {wave,lfcc,mfcc}
Feature classname. (default: 'lfcc')
--model_classname {MLP,WaveRNN,WaveLSTM,SimpleLSTM,ShallowCNN,TSSD}
Model classname. (default: 'ShallowCNN')
--in_distribution {True,False}, --in_dist {True,False}
Whether to use in distribution experiment setup. (default: True)
--device DEVICE Device to use. (default: 'cuda' if possible)
--deterministic Whether to use deterministic training (reproducible results).
--restore Whether to restore from checkpoint.
--eval_only Whether to evaluate only.
--debug Whether to use debug mode.
--debug_all Whether to use debug mode for all models.
```
Example:
To make sure all models can run successfully on your device, you can run the following command to test:
```bash
python train.py --debug_all
```
To train the model `ShallowCNN` with `lfcc` features in the in-distribution setting, you can run the following command:
```bash
python train.py --real data/real --fake data/fake --batch_size 128 --epochs 20 --seed 42 --feature_classname lfcc --model_classname ShallowCNN
```
Please use inline environment variable `CUDA_VISIBLE_DEVICES` to specify the GPU device(s) to use. For example:
```bash
CUDA_VISIBLE_DEVICES=0 python train.py
```
## Evaluation
By default, we directly use test set for training validation, and the best model and the best predictions will be automatically saved in the [`saved`](saved) directory during training/testing. Go to the directory [`saved`](saved) to see the evaluation results.
To evaluate on the test set using trained model, you can run the following command:
```bash
python train.py --feature_classname lfcc --model_classname ShallowCNN --restore --eval_only
```
Run the following command to re-compute the evaluation results based on saved predictions and labels:
```bash
python metrics.py
```
## Acknowledgements
- We thank [Dr. Matthieu De Mari](https://istd.sutd.edu.sg/people/faculty/matthieu-de-mari) and [Prof. Berrak Sisman](https://istd.sutd.edu.sg/people/faculty/berrak-sisman) for their teaching and guidance.
- We thank Joel Frank and Lea Schönherr. Our code is partially adopted from their repository [WaveFake](https://github.com/RUB-SysSec/WaveFake).
- We thank [Prof. Liu Jun](https://istd.sutd.edu.sg/people/faculty/liu-jun) for providing GPU resources for conducting experiments for this project.
## License
Our project is licensed under the [MIT License](LICENSE).