https://github.com/deezer/robust-ai-lyrics-detection
Code to reproduce the experiments presented in the article "Double Entendre: Robust Audio-Based AI-Generated Lyrics Detection via Multi-View Fusion" (Findings of ACL 2025)
https://github.com/deezer/robust-ai-lyrics-detection
Last synced: 5 months ago
JSON representation
Code to reproduce the experiments presented in the article "Double Entendre: Robust Audio-Based AI-Generated Lyrics Detection via Multi-View Fusion" (Findings of ACL 2025)
- Host: GitHub
- URL: https://github.com/deezer/robust-ai-lyrics-detection
- Owner: deezer
- Created: 2025-05-23T14:57:25.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-05-23T14:57:55.000Z (about 1 year ago)
- Last Synced: 2025-05-23T16:21:00.646Z (about 1 year ago)
- Homepage:
- Size: 2.93 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Double Entendre
**Robust audio-based AI-generated lyrics detection via multi-view fusion.**
Code to reproduce the experiments presented in *"Double Entendre: Robust Audio-Based AI-Generated Lyrics Detection via Multi-View Fusion"* (Findings of ACL 2025) and *"AI-Generated Song Detection via Lyrics Transcripts"* (ISMIR 2025).
This repository provides tools for detecting AI-generated music by analyzing sung lyrics extracted via speech transcription. It supports multiple transcribers, feature extractors (text and audio), and evaluation scenarios including audio perturbations and out-of-distribution generalization.
## Overview
**Supported scenarios:**
- Real vs. Fake (AI-generated lyrics + AI vocals)
- Real vs. Half-fake (human lyrics + AI vocals)
- Cross-platform generalization (Suno → Udio)
- Robustness under audio attacks (reverb, pitch, EQ, noise, time stretch)
---
## Project Structure
```
.
├── transcription/ # Audio → text transcription
│ ├── run_whisper.py # Whisper transcription
│ ├── run_meta.py # Meta models (Seamless, MMS)
│ └── ...
├── features/ # Feature extraction & detection
│ ├── feature_extractor.py # Base class
│ ├── sbert.py, llm2vec.py # Text features
│ ├── xeus.py, w2v2.py # Speech features
│ ├── run_features.py # Main experiments
│ └── ...
├── pyproject.toml
└── README.md
```
See module READMEs for details:
- [`transcription/README.md`](transcription/README.md)
- [`features/README.md`](features/README.md)
---
## Installation
```bash
git clone https://github.com/deezer/robust-AI-lyrics-detection.git
cd robust-AI-lyrics-detection
pip install .
```
**Requirements:** Python 3.9–3.11, CUDA-capable GPU recommended.
### Optional Dependencies
```bash
pip install ".[meta]" # Meta speech models (Seamless, MMS)
pip install ".[demucs]" # Source separation (improves transcription)
pip install ".[xeus]" # XEUS speech features (requires checkpoint below)
pip install ".[binoculars]" # Binoculars detector
```
### XEUS Checkpoint
XEUS requires a manual checkpoint download:
```bash
wget https://huggingface.co/espnet/xeus/resolve/main/model/xeus_checkpoint.pth -P checkpoints/
```
---
## Quick Start
### 1. Transcribe
```bash
cd transcription
python run_whisper.py --model large-v2 --file_path data/real/songs.json
```
### 2. Detect
```bash
cd features
python run_features.py
```
---
## Data Format
Dataset JSON files should follow this structure:
```json
{
"train": [
{
"md5": "abc123...",
"text": "Original lyrics text...",
"language_str": "en",
"genre": "pop",
"artist_name": "Artist",
"class": "real"
}
],
"test": [...]
}
```
For AI-generated songs, include `mp3_url` or local `audio_path` instead of `md5`.
| Field | Description |
|-------|-------------|
| `md5` | Audio file hash (for path lookup) |
| `text` | Ground-truth lyrics |
| `language_str` | ISO language code |
| `class` | `real`, `generated`, or `half-fake` |
| `transcription` | Added by transcription scripts |
---
## Configuration
Environment variables for path configuration:
```bash
export AUDIO_BASE_PATH=/path/to/audio # Audio file storage
export PROJECT_BASE_DIR=/path/to/project # Project root
export ARTEFACTS_DIR=./artefacts # Model outputs
```
---
## Citation
If you use this code, please cite:
### Double Entendre (Findings of ACL 2025)
```bibtex
@inproceedings{frohmann2025double,
author = {Frohmann, Markus and Meseguer-Brocal, Gabriel and
Epure, Elena V. and Labrak, Yanis},
title = {{Double Entendre}: Robust Audio-Based {AI}-Generated Lyrics
Detection via Multi-View Fusion},
booktitle = {Findings of the Association for Computational Linguistics: ACL 2025},
year = {2025},
publisher = {Association for Computational Linguistics}
}
```
### AI-Generated Song Detection via Lyrics Transcripts (ISMIR 2025)
```bibtex
@inproceedings{frohmann2025ai,
author = {Frohmann, Markus and Epure, Elena and
Meseguer Brocal, Gabriel and Schedl, Markus and Hennequin, Romain},
title = {{AI}-Generated Song Detection via Lyrics Transcripts},
booktitle = {Proceedings of the 26th International Society for
Music Information Retrieval Conference (ISMIR)},
year = {2025},
pages = {121--130},
address = {Daejeon, South Korea},
doi = {10.5281/zenodo.17706345}
}
```
---
## Related Work
- [synthetic_lyrics_detection](https://github.com/deezer/synthetic_lyrics_detection) — Synthetic lyrics generation pipeline (TrustNLP 2025)
---
## License
MIT License. See [LICENSE](LICENSE).
Built at [Deezer Research](https://research.deezer.com/), Paris.