{"id":22675611,"url":"https://github.com/stability-ai/stable-audio-metrics","last_synced_at":"2025-04-04T10:03:06.681Z","repository":{"id":221253157,"uuid":"750896388","full_name":"Stability-AI/stable-audio-metrics","owner":"Stability-AI","description":"Metrics for evaluating music and audio generative models – with a focus on long-form, full-band, and stereo generations.","archived":false,"fork":false,"pushed_at":"2025-03-24T02:05:02.000Z","size":103371,"stargazers_count":204,"open_issues_count":8,"forks_count":20,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-04-04T09:24:21.966Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Stability-AI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-31T14:39:42.000Z","updated_at":"2025-04-04T02:10:36.000Z","dependencies_parsed_at":"2024-12-09T18:08:22.296Z","dependency_job_id":null,"html_url":"https://github.com/Stability-AI/stable-audio-metrics","commit_stats":null,"previous_names":["stability-ai/stable-audio-metrics"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Stability-AI%2Fstable-audio-metrics","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Stability-AI%2Fstable-audio-metrics/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Stability-AI%2Fstable-audio-metrics/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Stability-AI%2Fstable-audio-metrics/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Stability-AI","download_url":"https://codeload.github.com/Stability-AI/stable-audio-metrics/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247157275,"owners_count":20893220,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-09T17:57:53.582Z","updated_at":"2025-04-04T10:03:06.644Z","avatar_url":"https://github.com/Stability-AI.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# stable-audio-metrics\nCollection of metrics for evaluating music and audio generative models:\n- Fréchet Distance at 48kHz, based on [Openl3](https://github.com/marl/openl3).\n- Kullback–Leibler divergence at 32kHz, based on [PaSST](https://github.com/kkoutini/PaSST).\n- CLAP score at 48kHz, based on [CLAP-LAION](https://github.com/LAION-AI/CLAP).\n\n`stable-audio-metrics` adapted established metrics to assess the more realistic use case of long-form full-band stereo generations. All metrics can deal with variable-length inputs.\n\n## Installation \nClone this repository, and create a python virtual environment `python3 -m venv env`, activate it `source env/bin/activate`, and install the dependencies `pip install -r requirements.txt`.\n\n- ***GPU SUPPORT*** –We only support GPU usage, because it can be too slow on CPU.\n- ***TROUBLESHOOTING*** – It might require an older version of cuda because of Openl3 dependencies. Try cuda 11.8 if you find it does not run on GPU as expected.\n\n## Documentation\n\nMain documentation is available in: \n- Fréchet Distance based on Openl3:  [`src/openl3_fd.py`](src/openl3_fd.py)\n- Kullback–Leibler divergence based on PaSST: [`src/passt_kld.py`](src/passt_kld.py)\n- CLAP-LAION score: [`src/clap_score.py`](src/clap_score.py)\n\nEach example script (with musiccaps) further details how to use it:\n- Fréchet Distance based on Openl3: [`examples/musiccaps_openl3_fd.py`](example/musiccaps_openl3_fd.py)\n- Kullback–Leibler divergence based on PaSST: [`examples/musiccaps_passt_kld.py`](example/musiccaps_passt_kld.py)\n- CLAP-LAION score: [`example/musiccapss_clap_score.py`](example/musiccapss_clap_score.py)\n\nOur [documentation](examples/README.md) includes examples on how to evaluate with:\n- MusicCaps dataset\n- AudiocCaps dataset\n- Song Describer dataset\n\n## Usage\n\nModify our examples such that they point to the folder you want to evaluate and run it. For example, modify and run: `CUDA_VISIBLE_DEVICES=6 python examples/audiocaps_no-audio.py` to evaluate with audiocaps. Check more examples in our [documentation](examples/README.md).\n- ***METRICS WITHOUT DATASETS*** – The `no-audio` examples allow running the evaluations without downloading the datasets, because reference statistics and embeddings are already computed in `load`.  We do not provide any pre-computed embedding for the CLAP score, because is fast to compute.\n- ***COMPARING WITH STABLE AUDIO*** – To compare against Stable Audio, you must set all parameters as in the `no-audio` examples. Even if your model outputs mono audio at a different sampling rate. `stable-audio-metrics` will do the resampling and mono/stereo handling to deliver a fair comparison.\n\n## Data structure\nGenerate an audio for every prompt in each dataset, and name each generated audio by its corresponding id. \n\nOur musiccaps examples assume the following structure, where 5,521 generations are named after the `ytid` from the prompts file `load/musiccaps-public.csv`: `your_model_outputsfolder/-kssA-FOzU.wav`, `your_model_outputs_folder/_0-2meOf9qY.wav`, ... `your_model_outputs_folder/ZzyWbehtt0M.wav`.\n\nOur audiocaps examples assume the following structure, where 4,875 generations are named after the `audiocap_id` from the prompts file `load/audiocaps-test.csv`:\n`your_model_outputsfolder/3.wav`, `your_model_outputs_folder/481.wav`, ... `your_model_outputs_folder/107432.wav`.\n\nExtend this data structure to your dataset, like we also did with the song describer dataset as an additional example. Check the examples' [documentation](examples/README.md).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstability-ai%2Fstable-audio-metrics","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstability-ai%2Fstable-audio-metrics","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstability-ai%2Fstable-audio-metrics/lists"}