{"id":13434193,"url":"https://github.com/spotify-research/llark","last_synced_at":"2025-03-17T14:30:54.789Z","repository":{"id":199952469,"uuid":"701524512","full_name":"spotify-research/llark","owner":"spotify-research","description":"Code for the paper \"LLark: A Multimodal Instruction-Following Language Model for Music\" by Josh Gardner, Simon Durand, Daniel Stoller, and Rachel Bittner.","archived":false,"fork":false,"pushed_at":"2024-05-30T18:11:47.000Z","size":432,"stargazers_count":298,"open_issues_count":5,"forks_count":22,"subscribers_count":8,"default_branch":"main","last_synced_at":"2024-10-27T14:44:21.759Z","etag":null,"topics":["foundation-model","multimodal","music-information-retrieval"],"latest_commit_sha":null,"homepage":"https://storage.googleapis.com/music2text-public/index.html","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/spotify-research.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-10-06T20:36:25.000Z","updated_at":"2024-10-24T22:00:21.000Z","dependencies_parsed_at":"2023-10-14T18:00:23.058Z","dependency_job_id":"8702e185-79ee-4cfa-868c-23f1fec75b43","html_url":"https://github.com/spotify-research/llark","commit_stats":null,"previous_names":["spotify-research/llark"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spotify-research%2Fllark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spotify-research%2Fllark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spotify-research%2Fllark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spotify-research%2Fllark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/spotify-research","download_url":"https://codeload.github.com/spotify-research/llark/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244050077,"owners_count":20389636,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["foundation-model","multimodal","music-information-retrieval"],"created_at":"2024-07-31T02:01:49.211Z","updated_at":"2025-03-17T14:30:54.347Z","avatar_url":"https://github.com/spotify-research.png","language":"Jupyter Notebook","funding_links":[],"categories":["Jupyter Notebook"],"sub_categories":[],"readme":"# LLark: A Multimodal Instruction-Following Language Model for Music\n\n![Static Badge](https://img.shields.io/badge/status-experimental-green)\n\n\nThis is the code release associated with the paper:\n\n\u003e LLark: A Multimodal Instruction-Following Language Model for Music\u003cbr /\u003e\n\u003e Josh Gardner, Simon Durand, Daniel Stoller, Rachel Bittner.\u003cbr /\u003e\n\u003e Published at ICML 2024\n\nThis repository contains the code used to build the training dataset, preprocess existing open-source music datasets, train the model, and run inference. **Note that this paper is not accompanied with any trained models.**\n\nFor more details about this work, you can read the [preprint of the paper](https://arxiv.org/abs/2310.07160), read the [Spotify Research blog post](https://research.atspotify.com/2023/10/llark-a-multimodal-foundation-model-for-music/) about this work, and [listen to \ndemos on the companion site](https://storage.googleapis.com/music2text-public/index.html). \nSee also the [ICML page for the paper](https://icml.cc/virtual/2024/poster/34440).\n\n**This is not an officially supported Spotify product.**\n\n\n## Citing This Repository\n\nIf you use this code in a publication, please cite our paper:\n\n```\n@article{gardner2023llark,\n  title={LLark: A Multimodal Instruction-Following Language Model for Music},\n  author={Gardner, Josh and Durand, Simon and Stoller, Daniel and Bittner, Rachel},\n  journal={Proc. of the International Conference on Machine Learning (ICML)},\n  year={2024}\n}\n```\n\n# Using The Code\n\n## Docker Environments\n\nAll of the code in this repo should be run in one of the provided Docker containers (see `docker`). There are three Dockerfiles for separate use cases:\n* `m2t-train.dockerfile` is for model training and inference\n* `m2t-preprocess.dockerfile` is for data preprocessing (i.e. running Beam pipelines)\n* `jukebox-embed.dockerfile` is for extracting Jukebox embeddings\n\n\u003e **Warning**\n\u003e \n\u003e Note that the data preprocessing pipeline can be run locally or on [Google Cloud Dataflow](https://cloud.google.com/dataflow), a cloud-based data processing system that allows the code to be sped up by running in parallel across potentially thousands of machines. **Using these pipelines on Dataflow may incur costs against your own Google Cloud project.**\n\u003e \n\u003e Running these pipelines on Google Cloud Dataflow may also require pushing one or more Docker images to [Google Artifact Registry](https://cloud.google.com/artifact-registry), properly setting up permissions to use these Docker images from within Google Cloud Dataflow, and replacing various hard-coded Docker image paths with paths to your uploaded Docker images. **This repository contains no code to aid with this process.**\n\n## Preprocess Your Own Audio\n\nWe provide some utilitities for converting, cropping, and annotating audio datasets using the process described in our paper. These utilities are located in `scripts/preprocessing`; see the header of each file in that directory for usage examples.\n\nAll of the preprocessing scripts use Apache Beam (on Google DataFlow). In order to run these code examples, we recommend using the Docker environment defined in `docker/m2t-preprocess.dockerfile`.\n\n## Generate Instruction-Tuning Data\n\nThe file `scripts/openai/fetch_openai_instruct_data.py` can be adapted to generate instruction-tuning data. The prompts for each dataset are located in the `m2t/instruct` directory.\n\n## Extract Jukebox Embeddings\n\nThe Jukebox embedding pipeline uses Apache Beam (on Google Cloud Dataflow). In order to run these code examples, we recommend using the Docker environment defined in `docker/jukebox-embed.dockerfile`. Embedding a set of around 100k audio files takes less than 1 hour using the default parameters in that script.\n\nFor using CLAP, we provide a similar set of utilities in the `scripts/clap` subdirectory; the CLAP embedding script can be executed from the `m2t-preprocess.dockerfile` Docker environment and does not require its own separate Docker environment.\n\n## Training\n\nThis repo does not officially support training. However, we provide scripts that could be adapted to train a model (mostly as a way to describe the exact training hyperparameters used in training) in the `scripts/training` subdirectory. This includes the main LLark model, along with models based on MPT-1B and CLAP.\n\n## Evaluation\n\nFor reproducibility, we provide a set of notebooks that define the evaluation procedure used in our paper. These notebooks can be used or modified to support evaluation of other models. Note that in some cases, the notebooks require obtaining the original evaluation datasets ([MusicNet](https://zenodo.org/records/5120004), [MedleyDB](https://medleydb.weebly.com), [GTZAN](https://paperswithcode.com/dataset/gtzan), Giant Steps ([tempo](https://github.com/GiantSteps/giantsteps-tempo-dataset), [key](https://github.com/GiantSteps/giantsteps-key-dataset))). \n\n## Contributing\n\nWe feel that a welcoming community is important and we ask that you follow Spotify's\n[Open Source Code of Conduct](https://github.com/spotify/code-of-conduct/blob/main/code-of-conduct.md)\nin all interactions with the community.\n\n## Authors\n\n* Josh Gardner \u003cjpgard@cs.washington.edu\u003e\n* Peter Sobot \u003cpsobot@spotify.com\u003e\n\nFollow [@SpotifyResearch](https://twitter.com/SpotifyResearch) on Twitter for updates.\n\n\n## License\n\nCopyright 2023 Spotify, Inc.\n\nLicensed under the Apache License, Version 2.0: https://www.apache.org/licenses/LICENSE-2.0\n\nThe `m2t/llava` directory contains a subset of the code from [LLaVA](https://github.com/haotian-liu/LLaVA). We do not make substantial modifications to this code but include it in order to make the repository self-contained and remove other external dependencies.\n\nThe `jukebox` directory contains code adapted from [`jukemir`](https://github.com/p-lambda/jukemir).\n\n## Security Issues?\n\nPlease report sensitive security issues via Spotify's bug-bounty program (https://hackerone.com/spotify) rather than GitHub.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspotify-research%2Fllark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fspotify-research%2Fllark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspotify-research%2Fllark/lists"}