{"id":25348551,"url":"https://github.com/markhershey/audiodeepfakedetection","last_synced_at":"2025-10-29T18:31:06.566Z","repository":{"id":69729969,"uuid":"475504129","full_name":"MarkHershey/AudioDeepFakeDetection","owner":"MarkHershey","description":"SUTD 50.039 Deep Learning Course Project (2022 Spring)","archived":false,"fork":false,"pushed_at":"2023-11-23T11:41:09.000Z","size":206060,"stargazers_count":58,"open_issues_count":0,"forks_count":17,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-05-02T06:25:03.669Z","etag":null,"topics":["audio","audio-deepfake-detection","deep-learning","deepfake-detection"],"latest_commit_sha":null,"homepage":"https://markhh.com/AudioDeepFakeDetection/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MarkHershey.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2022-03-29T15:26:23.000Z","updated_at":"2024-05-01T11:46:44.000Z","dependencies_parsed_at":"2023-11-23T12:44:41.511Z","dependency_job_id":null,"html_url":"https://github.com/MarkHershey/AudioDeepFakeDetection","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MarkHershey%2FAudioDeepFakeDetection","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MarkHershey%2FAudioDeepFakeDetection/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MarkHershey%2FAudioDeepFakeDetection/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MarkHershey%2FAudioDeepFakeDetection/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MarkHershey","download_url":"https://codeload.github.com/MarkHershey/AudioDeepFakeDetection/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238867092,"owners_count":19544112,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio","audio-deepfake-detection","deep-learning","deepfake-detection"],"created_at":"2025-02-14T15:39:06.849Z","updated_at":"2025-10-29T18:31:05.033Z","avatar_url":"https://github.com/MarkHershey.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Audio Deep Fake Detection\n\nA Course Project for SUTD 50.039 Theory and Practice of Deep Learning (2022 Spring)\n\nCreated by [Mark He Huang](https://markhh.com/), [Peiyuan Zhang](https://www.linkedin.com/in/lance-peiyuan-zhang-5b2886194/), [James Raphael Tiovalen](https://jamestiotio.github.io/), [Madhumitha Balaji](https://www.linkedin.com/in/madhu-balaji/), and [Shyam Sridhar](https://www.linkedin.com/in/shyam-sridhar/).\n\nCheck out our: [Project Report](Report.pdf) | [Interactive Website](https://markhh.com/AudioDeepFakeDetection/)\n\n## Setup Environment\n\n```bash\n# Set up Python virtual environment\npython3 -m venv venv \u0026\u0026 source venv/bin/activate\n\n# Make sure your PIP is up to date\npip install -U pip wheel setuptools\n\n# Install required dependencies\npip install -r requirements.txt\n```\n\n-   Install PyTorch that suits your machine: [https://pytorch.org/get-started/locally/](https://pytorch.org/get-started/locally/)\n\n## Setup Datasets\n\nYou may download the datasets used in the project from the following URLs:\n\n-   (Real) Human Voice Dataset: [LJ Speech (v1.1)](https://keithito.com/LJ-Speech-Dataset/)\n    -   This dataset consists of 13,100 short audio clips of a single speaker reading passages from 7 non-fiction books.\n-   (Fake) Synthetic Voice Dataset: [WaveFake (v1.20)](https://zenodo.org/record/5642694)\n    -   The dataset consists of 104,885 generated audio clips (16-bit PCM wav).\n\nAfter downloading the datasets, you may extract them under `data/real` and `data/fake` respectively. In the end, the `data` directory should look like this:\n\n```\ndata\n├── real\n│   └── wavs\n└── fake\n    ├── common_voices_prompts_from_conformer_fastspeech2_pwg_ljspeech\n    ├── jsut_multi_band_melgan\n    ├── jsut_parallel_wavegan\n    ├── ljspeech_full_band_melgan\n    ├── ljspeech_hifiGAN\n    ├── ljspeech_melgan\n    ├── ljspeech_melgan_large\n    ├── ljspeech_multi_band_melgan\n    ├── ljspeech_parallel_wavegan\n    └── ljspeech_waveglow\n```\n\n## Model Checkpoints\n\nYou may download the model checkpoints from here: [Google Drive](https://drive.google.com/drive/folders/1iR2zLQjBZgxIs3gHkXh54Ulg-M6-6W4L?usp=sharing). Unzip the files and replace the `saved` directory with the extracted files.\n\n## Training\n\nUse the [`train.py`](train.py) script to train the model.\n\n```\nusage: train.py [-h] [--real_dir REAL_DIR] [--fake_dir FAKE_DIR] [--batch_size BATCH_SIZE] [--epochs EPOCHS]\n                [--seed SEED] [--feature_classname {wave,lfcc,mfcc}]\n                [--model_classname {MLP,WaveRNN,WaveLSTM,SimpleLSTM,ShallowCNN,TSSD}]\n                [--in_distribution {True,False}] [--device DEVICE] [--deterministic] [--restore] [--eval_only] [--debug] [--debug_all]\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --real_dir REAL_DIR, --real REAL_DIR\n                        Directory containing real data. (default: 'data/real')\n  --fake_dir FAKE_DIR, --fake FAKE_DIR\n                        Directory containing fake data. (default: 'data/fake')\n  --batch_size BATCH_SIZE\n                        Batch size. (default: 256)\n  --epochs EPOCHS       Number of maximum epochs to train. (default: 20)\n  --seed SEED           Random seed. (default: 42)\n  --feature_classname {wave,lfcc,mfcc}\n                        Feature classname. (default: 'lfcc')\n  --model_classname {MLP,WaveRNN,WaveLSTM,SimpleLSTM,ShallowCNN,TSSD}\n                        Model classname. (default: 'ShallowCNN')\n  --in_distribution {True,False}, --in_dist {True,False}\n                        Whether to use in distribution experiment setup. (default: True)\n  --device DEVICE       Device to use. (default: 'cuda' if possible)\n  --deterministic       Whether to use deterministic training (reproducible results).\n  --restore             Whether to restore from checkpoint.\n  --eval_only           Whether to evaluate only.\n  --debug               Whether to use debug mode.\n  --debug_all           Whether to use debug mode for all models.\n```\n\nExample:\n\nTo make sure all models can run successfully on your device, you can run the following command to test:\n\n```bash\npython train.py --debug_all\n```\n\nTo train the model `ShallowCNN` with `lfcc` features in the in-distribution setting, you can run the following command:\n\n```bash\npython train.py --real data/real --fake data/fake --batch_size 128 --epochs 20 --seed 42 --feature_classname lfcc --model_classname ShallowCNN\n```\n\nPlease use inline environment variable `CUDA_VISIBLE_DEVICES` to specify the GPU device(s) to use. For example:\n\n```bash\nCUDA_VISIBLE_DEVICES=0 python train.py\n```\n\n## Evaluation\n\nBy default, we directly use test set for training validation, and the best model and the best predictions will be automatically saved in the [`saved`](saved) directory during training/testing. Go to the directory [`saved`](saved) to see the evaluation results.\n\nTo evaluate on the test set using trained model, you can run the following command:\n\n```bash\npython train.py --feature_classname lfcc --model_classname ShallowCNN --restore --eval_only\n```\n\nRun the following command to re-compute the evaluation results based on saved predictions and labels:\n\n```bash\npython metrics.py\n```\n\n## Acknowledgements\n\n-   We thank [Dr. Matthieu De Mari](https://istd.sutd.edu.sg/people/faculty/matthieu-de-mari) and [Prof. Berrak Sisman](https://istd.sutd.edu.sg/people/faculty/berrak-sisman) for their teaching and guidance.\n-   We thank Joel Frank and Lea Schönherr. Our code is partially adopted from their repository [WaveFake](https://github.com/RUB-SysSec/WaveFake).\n-   We thank [Prof. Liu Jun](https://istd.sutd.edu.sg/people/faculty/liu-jun) for providing GPU resources for conducting experiments for this project.\n\n## License\n\nOur project is licensed under the [MIT License](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarkhershey%2Faudiodeepfakedetection","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmarkhershey%2Faudiodeepfakedetection","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarkhershey%2Faudiodeepfakedetection/lists"}