{"id":19815959,"url":"https://github.com/replicate/cog-musicgen","last_synced_at":"2025-05-01T10:31:58.228Z","repository":{"id":176313434,"uuid":"651670424","full_name":"replicate/cog-musicgen","owner":"replicate","description":"A cog implementation of Meta's MusicGen models","archived":false,"fork":false,"pushed_at":"2024-03-27T18:59:55.000Z","size":228,"stargazers_count":73,"open_issues_count":2,"forks_count":35,"subscribers_count":10,"default_branch":"main","last_synced_at":"2024-04-12T19:53:06.871Z","etag":null,"topics":["ai","generative-ai","meta","music","musicgen"],"latest_commit_sha":null,"homepage":"https://replicate.com/meta/musicgen","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/replicate.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-06-09T19:19:58.000Z","updated_at":"2024-04-05T22:12:37.000Z","dependencies_parsed_at":"2023-12-14T16:30:02.473Z","dependency_job_id":"96642b2a-2f55-4529-a727-56950efbb2d3","html_url":"https://github.com/replicate/cog-musicgen","commit_stats":null,"previous_names":["replicate/cog-musicgen-melody","replicate/cog-musicgen"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/replicate%2Fcog-musicgen","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/replicate%2Fcog-musicgen/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/replicate%2Fcog-musicgen/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/replicate%2Fcog-musicgen/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/replicate","download_url":"https://codeload.github.com/replicate/cog-musicgen/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224253515,"owners_count":17280932,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","generative-ai","meta","music","musicgen"],"created_at":"2024-11-12T10:07:51.888Z","updated_at":"2024-11-12T10:07:52.537Z","avatar_url":"https://github.com/replicate.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Cog implementation of MusicGen\n[![Replicate](https://replicate.com/joehoover/musicgen-melody/badge)](https://replicate.com/joehoover/musicgen-melody) \n\nMusicGen is [a simple and controllable model for music generation](https://arxiv.org/abs/2306.05284).  It is a single stage auto-regressive Transformer model trained over a 32kHz \u003ca href=\"https://github.com/facebookresearch/encodec\"\u003eEnCodec tokenizer\u003c/a\u003e with 4 codebooks sampled at 50 Hz. Unlike existing methods like [MusicLM](https://arxiv.org/abs/2301.11325), MusicGen doesn't require a self-supervised semantic representation, and it generates all 4 codebooks in one pass. By introducing a small delay between the codebooks, the authors show they can predict them in parallel, thus having only 50 auto-regressive steps per second of audio. They used 20K hours of licensed music to train MusicGen. Specifically, they relied on an internal dataset of 10K high-quality music tracks, and on the ShutterStock and Pond5 music data.\n\n\nFor more information about this model, see [here](https://github.com/facebookresearch/audiocraft).\n\nYou can demo this model or learn how to use it with Replicate's API [here](https://replicate.com/joehoover/musicgen-melody). \n\n# Run with Cog\n\n[Cog](https://github.com/replicate/cog) is an open-source tool that packages machine learning models in a standard, production-ready container. \nYou can deploy your packaged model to your own infrastructure, or to [Replicate](https://replicate.com/), where users can interact with it via web interface or API.\n\n## Prerequisites \n\n**Cog.** Follow these [instructions](https://github.com/replicate/cog#install) to install Cog, or just run: \n\n```\nsudo curl -o /usr/local/bin/cog -L \"https://github.com/replicate/cog/releases/latest/download/cog_$(uname -s)_$(uname -m)\"\nsudo chmod +x /usr/local/bin/cog\n```\n\nNote, to use Cog, you'll also need an installation of [Docker](https://docs.docker.com/get-docker/).\n\n* **GPU machine.** You'll need a Linux machine with an NVIDIA GPU attached and the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker) installed. If you don't already have access to a machine with a GPU, check out our [guide to getting a \nGPU machine](https://replicate.com/docs/guides/get-a-gpu-machine).\n\n## Step 1. Clone this repository\n\n```sh\ngit clone https://github.com/replicate/cog-musicgen-melody\n```\n\n## Step 2. Run the model\n\nTo run the model, you need a local copy of the model's Docker image. You can satisfy this requirement by specifying the image ID in your call to `predict` like:\n\n```\ncog predict r8.im/joehoover/musicgen-melody@sha256:1a53415e6c4549e3022a0af82f4bd22b9ae2e747a8193af91b0bdffe63f93dfd -i description=tense staccato strings. plucked strings. dissonant. scary movie. -i duration=8\n```\n\nFor more information, see the Cog section [here](https://replicate.com/joehoover/musicgen-melody/api#run)\n\nAlternatively, you can build the image yourself, either by running `cog build` or by letting `cog predict` trigger the build process implicitly. For example, the following will trigger the build process and then execute prediction: \n\n```\ncog predict -i description=\"tense staccato strings. plucked strings. dissonant. scary movie.\" -i duration=8\n```\n\nNote, the first time you run `cog predict`, model weights and other requisite assets will be downloaded if they're not available locally. This download only needs to be executed once.\n\n# Run on replicate\n\n## Step 1. Ensure that all assets are available locally\n\nIf you haven't already, you should ensure that your model runs locally with `cog predict`. This will guarantee that all assets are accessible. E.g., run: \n\n```\ncog predict -i description=tense staccato strings. plucked strings. dissonant. scary movie. -i duration=8\n```\n\n## Step 2. Create a model on Replicate.\n\nGo to [replicate.com/create](https://replicate.com/create) to create a Replicate model. If you want to keep the model private, make sure to specify \"private\".\n\n## Step 3. Configure the model's hardware\n\nReplicate supports running models on variety of CPU and GPU configurations. For the best performance, you'll want to run this model on an A100 instance.\n\nClick on the \"Settings\" tab on your model page, scroll down to \"GPU hardware\", and select \"A100\". Then click \"Save\".\n\n## Step 4: Push the model to Replicate\n\n\nLog in to Replicate:\n\n```\ncog login\n```\n\nPush the contents of your current directory to Replicate, using the model name you specified in step 1:\n\n```\ncog push r8.im/username/modelname\n```\n\n[Learn more about pushing models to Replicate.](https://replicate.com/docs/guides/push-a-model)\n\n# Fine-tune MusicGen\n\nSupport for fine-tuning MusicGen is in development. Currently, minimal support has been implemented via an adaptation of @chavez's [`music_gen` trainer](https://github.com/chavinlo/musicgen_trainer). \n\nAssuming you have a local environment configured (i.e. you've completed the steps specified under Run with Cog), you can run training with a command like:\n\n```\ncog train -i dataset_path=@\u003cpath-to-your-data\u003e \u003cadditional hyperparameters\u003e\n```\n\n## Data preparation for training\n\nCog requires input data to be a file; however, our training script expects a directory. Accordingly, \nin production, training data should be provided as a tarball of a directory of properly formatted training data. \nHowever, you can bypass this requirement by naming your training data directory `./train_data`. If such a directory exists,\nthe training script will attempt to load data from that directory (see lines 140-147 in `train.py`).\n\nCurrently, training only supports music generation with text prompts. \n\nTo train the model on your own data, follow these steps: \n\n1. Convert your audio files to .wav segments of no more than 30 seconds'\n2. Every audio file in your training directory must have a correspondint `.txt` file with the same filename. These text files should contain the text prompt that you want to associat with the corresponding audio file. For example, if you have `audio_1.wav`, you must also have `audio_1.txt` and that text file should contain the prompt for `audio_1.wav`. \n3. These files should be placed in a single directory. \n4. If that directory is called `./train_data`, then you can simply run the training script like: \n```\ncog train -i dataset_path=@./train_data/ \u003cadditional hyperparameters\u003e\n```\n5. Alternatively, if `train_data` does not exist, you can tarball your data directory and pass the path to the tarball to `cog train ...`. The train script will then untar your data and attempt to load it. \n\n### Example\n\nRun this to train on a single clip:\n\n```\nmkdir ./train_data/\nwget -P ./train_data/ https://github.com/facebookresearch/audiocraft/raw/main/assets/bach.mp3\necho bach \u003e ./train_data/bach.txt\ntar -cvzf train_data.tar.gz train_data/\ncog train -i dataset_path=@./data.tar.gz -i epochs=10\n```\n\nThen, you can load your model like `model.lm.load_state_dict(torch.load('model_outdir/lm_final.pt'))` and generate like:\n\n```\nmodel.set_generation_params(\n    duration=8,\n    top_k=250,\n    top_p=0,\n    temperature=1,\n    cfg_coef=3,\n)\nwav = model.generate(descriptions=[''], progress=True)\n```\n\n# Licenses\n\n* All code in this repository is licensed under the Apache License 2.0 license.\n* The code in the [Audiocraft](https://github.com/facebookresearch/audiocraft) repository is released under the MIT license as found in the [LICENSE file](LICENSE).\n* The weights in the [Audiocraft](https://github.com/facebookresearch/audiocraft) repository are released under the CC-BY-NC 4.0 license as found in the [LICENSE_weights file](LICENSE_weights).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Freplicate%2Fcog-musicgen","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Freplicate%2Fcog-musicgen","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Freplicate%2Fcog-musicgen/lists"}