{"id":29020211,"url":"https://github.com/deezer/synthetic_lyrics_detection","last_synced_at":"2026-02-02T11:47:37.402Z","repository":{"id":281019625,"uuid":"943942929","full_name":"deezer/synthetic_lyrics_detection","owner":"deezer","description":"Code used for the paper \"Synthetic Lyrics Detection Across Languages and Genres\" (NAACL 2025 Workshop TrustNLP), co-authored by Yanis Labrak, Markus Frohmann, Gabriel Meseguer-Brocal, and Elena V. Epure.","archived":false,"fork":false,"pushed_at":"2025-05-21T11:53:10.000Z","size":8,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-05-21T12:49:03.818Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/deezer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-03-06T14:22:32.000Z","updated_at":"2025-05-21T11:53:14.000Z","dependencies_parsed_at":"2025-05-21T12:47:44.132Z","dependency_job_id":null,"html_url":"https://github.com/deezer/synthetic_lyrics_detection","commit_stats":null,"previous_names":["deezer/synthetic_lyrics_detection"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/deezer/synthetic_lyrics_detection","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deezer%2Fsynthetic_lyrics_detection","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deezer%2Fsynthetic_lyrics_detection/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deezer%2Fsynthetic_lyrics_detection/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deezer%2Fsynthetic_lyrics_detection/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/deezer","download_url":"https://codeload.github.com/deezer/synthetic_lyrics_detection/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/deezer%2Fsynthetic_lyrics_detection/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261978911,"owners_count":23239417,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-26T01:04:32.641Z","updated_at":"2026-02-02T11:47:37.396Z","avatar_url":"https://github.com/deezer.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# synthetic_lyrics_detection\n\nThis repository provides Python code to reproduce the experiments from the article [**Synthetic Lyrics Detection Across Languages and Genres**](https://aclanthology.org/2025.trustnlp-main.34/), accepted for publication to [**NAACL 2025 Workshop TrustNLP**](https://trustnlpworkshop.github.io/).\n\n\n## Installation\n\n```sh\ngit clone https://github.com/deezer/synthetic_lyrics_detection.git\ncd synthetic_lyrics_detection\n```\n\n## Build and Run the Docker Image\n\nBuild the Docker image and run it in a container with an interactive bash session.  \n\u003e **Note:** The current Docker image requires a **CUDA-capable GPU**.\n\n```sh\nmake build\nmake run-bash\n```\n\n## Data Generation Pipeline\n\nInstall Ollama and pull the required models:\n\n```sh\ncurl -fsSL https://ollama.com/install.sh | sh\nollama serve\u0026\nollama pull mistral \u0026\u0026 ollama pull tinyllama \u0026\u0026 ollama pull wizardlm2\n```\n\nRun the data generation pipeline:\n```sh\npython3 data_pipeline/run_pipeline.py \u003cinput_json_file_with_human_written_lyrics\u003e output/\n```\n\u003e **Note:** Replace **\u003cinput_json_file_with_human_written_lyrics\u003e** with the path to your JSON file containing human-written lyrics.\n\n## Synthetic Lyrics Detection\n\nPlease refer to [this repository](https://github.com/deezer/robust-AI-lyrics-detection) which contains the detectors and scripts needed to run the experiments.\n\n## Paper\n\nPlease cite our paper if you use this data or code in your work:\n```\n@inproceedings{labrak2024detecting,\n  \tauthor    = {Labrak, Yanis  and\n               Frohmann, Markus  and\n               Meseguer-Brocal, Gabriel  and\n               Epure, Elena V.},\n  \tbooktitle = {Proceedings of the 5th Workshop on Trustworthy NLP (TrustNLP 2025)},\n  \teditor    = {Cao, Trista  and\n               Das, Anubrata  and\n               Kumarage, Tharindu  and\n               Wan, Yixin  and\n               Krishna, Satyapriya  and\n               Mehrabi, Ninareh  and\n               Dhamala, Jwala  and\n               Ramakrishna, Anil  and\n               Galystan, Aram  and\n               Kumar, Anoop  and\n               Gupta, Rahul  and\n               Chang, Kai-Wei},\n\t  isbn\t\t= {979-8-89176-233-6},\n\t  month     = may,\n\t  pages     = {524--541},\n\t  publisher = {Association for Computational Linguistics},\n\t  title     = {Synthetic Lyrics Detection Across Languages and Genres},\n\t  url       = {https://aclanthology.org/2025.trustnlp-main.34/},\n\t  year      = {2025},\n  \t  address   = {Albuquerque, New Mexico},\n\t  abstract  = {In recent years, the use of large language models (LLMs) to generate music content, particularly lyrics, has gained in popularity. These advances provide valuable tools for artists and enhance their creative processes, but they also raise concerns about copyright violations, consumer satisfaction, and content spamming. Previous research has explored content detection in various domains. However, no work has focused on the text modality, lyrics, in music. To address this gap, we curated a diverse dataset of real and synthetic lyrics from multiple languages, music genres, and artists. The generation pipeline was validated using both humans and automated methods. We performed a thorough evaluation of existing synthetic text detection approaches on lyrics, a previously unexplored data type. We also investigated methods to adapt the best-performing features to lyrics through unsupervised domain adaptation. Following both music and industrial constraints, we examined how well these approaches generalize across languages, scale with data availability, handle multilingual language content, and perform on novel genres in few-shot settings. Our findings show promising results that could inform policy decisions around AI-generated music and enhance transparency for users.}\n}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeezer%2Fsynthetic_lyrics_detection","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdeezer%2Fsynthetic_lyrics_detection","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdeezer%2Fsynthetic_lyrics_detection/lists"}