{"id":13415248,"url":"https://github.com/chavinlo/musicgen_trainer","last_synced_at":"2025-03-14T22:33:06.111Z","repository":{"id":227014257,"uuid":"652663198","full_name":"chavinlo/musicgen_trainer","owner":"chavinlo","description":"simple trainer for musicgen/audiocraft","archived":false,"fork":false,"pushed_at":"2024-07-12T03:02:27.000Z","size":10795,"stargazers_count":20,"open_issues_count":10,"forks_count":5,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-11T10:49:07.587Z","etag":null,"topics":["audiocraft","generative-music","music","musicgen","trainer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chavinlo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-06-12T14:36:31.000Z","updated_at":"2025-02-03T22:16:40.000Z","dependencies_parsed_at":"2024-10-26T11:22:17.538Z","dependency_job_id":"c9f7660d-1934-4850-b7c7-31f4779c5c77","html_url":"https://github.com/chavinlo/musicgen_trainer","commit_stats":null,"previous_names":["chavinlo/musicgen_trainer"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chavinlo%2Fmusicgen_trainer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chavinlo%2Fmusicgen_trainer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chavinlo%2Fmusicgen_trainer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chavinlo%2Fmusicgen_trainer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chavinlo","download_url":"https://codeload.github.com/chavinlo/musicgen_trainer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243658057,"owners_count":20326459,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audiocraft","generative-music","music","musicgen","trainer"],"created_at":"2024-07-30T21:00:45.891Z","updated_at":"2025-03-14T22:33:06.104Z","avatar_url":"https://github.com/chavinlo.png","language":"Python","funding_links":[],"categories":["Python","🌟 Featured Community Projects"],"sub_categories":[],"readme":"# MusicGen Trainer\n\nThis is a trainer for MusicGen model. It's based on [this](https://github.com/chavinlo/musicgen_trainer).\n\n# Contributors\n- [@mkualquiera](https://github.com/mkualquiera) and [@neverix](https://github.com/neverix): actually got it working\n- elyxlz: help with masks\n\n## STATUS: MVP\n\nRemoving the gradient scaler, increasing the batch size and only training on conditional samples makes training work.\n\nTODO:\n* [ ] Add notebook\n* [ ] Add webdataset support\n* [ ] Try larger models\n* [ ] Add LoRA\n* [ ] Make rolling generation customizable\n\n## Usage\n\n### Dataset Creation\n\nCreate a folder, in it, place your audio and caption files. **They must be `.wav` and `.txt` format respectively.** You can omit `.txt` files for training with empty text by setting the `--no_label` option to `1`.\n\n![](https://i.imgur.com/AlDlqBI.png)\n\nYou can use `.wav` files longer than 30 seconds, in that case the model will be trained on random crops of the original `.wav` file.\n\nIn this example, segment_000.txt contains the caption \"jazz music, jobim\" for wav file segment_000.wav.\n\n### Running the trainer\n\nRun `python3 run.py --dataset \u003cPATH_TO_YOUR_DATASET\u003e`. Make sure to use the full path to the dataset, not a relative path.\n\n### Options\n\n- `dataset_path`: String, path to your dataset with `.wav` and `.txt` pairs.\n- `model_id`: String, MusicGen model to use. Can be `small`/`medium`/`large`. Default: `small`\n- `lr`: Float, learning rate. Default: `0.00001`/`1e-5`\n- `epochs`: Integer, epoch count. Default: `100`\n- `use_wandb`: Integer, `1` to enable wandb, `0` to disable it. Default: `0` = Disabled\n- `save_step`: Integer, amount of steps to save a checkpoint. Default: None\n- `no_label`: Integer, whether to read a dataset without `.txt` files. Default: `0` = Disabled\n- `tune_text`: Integer, perform textual inversion instead of full training. Default: `0` = Disabled\n- `weight_decay`: Float, the weight decay regularization coefficient. Default: `0.00001`/`1e-5`\n- `grad_acc`: Integer, number of steps to smooth gradients over. Default: 2\n- `warmup_steps`: Integer, amount of steps to slowly increase learning rate over to let the optimizer compute statistics. Default: 16\n- `batch_size`: Integer, batch size the model sees at once. Reduce to lower memory consumption. Default: 4\n- `use_cfg`: Integer, whether to train with some labels randomly dropped out. Default: `0` = Disabled\n\nYou can set these options like this: `python3 run.py --use_wandb=1`.\n\n### Models\n\nOnce training finishes, the model (and checkpoints) will be available under the `models` folder in the same path you ran the trainer on.\n\n![](https://i.imgur.com/Mu19EPb.png)\n\nTo load them, simply run the following on your generation script:\n\n```python\nmodel.lm.load_state_dict(torch.load('models/lm_final.pt'))\n```\n\nWhere `model` is the MusicGen Object and `models/lm_final.pt` is the path to your model (or checkpoint).\n\n## Citations\n\n```\n@article{copet2023simple,\n      title={Simple and Controllable Music Generation},\n      author={Jade Copet and Felix Kreuk and Itai Gat and Tal Remez and David Kant and Gabriel Synnaeve and Yossi Adi and Alexandre Défossez},\n      year={2023},\n      journal={arXiv preprint arXiv:2306.05284},\n}\n```\n\n@mkualquiera (mkualquiera@discord) added batching, debugged the code and trained the first working model.\n\nSpecial thanks to elyxlz (223864514326560768@discord) for helping @chavinlau with the masks.\n\n@chavinlau wrote the original version of the training code. Original README:\n\n---\n\n# MusicGen Trainer\n\nThis is a trainer for MusicGen model. Currently it's very basic but I'll add more features soon.\n\n## STATUS: BROKEN\n\nOnly works for overfitting. Breaks model on anything else\n\nMore information on the current training quality on the [experiments section](#experiments)\n\n## Usage\n\n### Dataset Creation\n\nCreate a folder, in it, place your audio and caption files. **They must be WAV and TXT format respectively.**\n\n![](https://i.imgur.com/AlDlqBI.png)\n\n### Important: Split your audios in 35 second chunks. Only the first 30 seconds will be processed. Audio cannot be less than 30 seconds.\n\nIn this example, segment_000.txt contains the caption \"jazz music, jobim\" for wav file segment_000.wav\n\n### Running the trainer\n\nRun `python3 run.py --dataset /home/ubuntu/dataset`, replace `/home/ubuntu/dataset` with the path to your dataset. Make sure to use the full path, not a relative path.\n\n### Options\n\n- `dataset_path`: String, path to your dataset with WAV and TXT pairs.\n- `model_id`: String, MusicGen model to use. Can be `small`/`medium`/`large`. Default: `small`\n- `lr`: Float, learning rate. Default: `0.0001`/`1e-4`\n- `epochs`: Integer, epoch count. Default: `5`\n- `use_wandb`: Integer, `1` to enable wandb, `0` to disable it. Default: `0` = Disabled\n- `save_step`: Integer, amount of steps to save a checkpoint. Default: None\n\nYou can set these options like this: `python3 run.py --use_wandb=1`\n\n### Models\n\nOnce training finishes, the model (and checkpoints) will be available under the `models` folder in the same path you ran the trainer on.\n\n![](https://i.imgur.com/Mu19EPb.png)\n\nTo load them, simply run the following on your generation script:\n\n```python\nmodel.lm.load_state_dict(torch.load('models/lm_final.pt'))\n```\n\nWhere `model` is the MusicGen Object and `models/lm_final.pt` is the path to your model (or checkpoint).\n\n## Experiments\n\n### Electronic music (Moe Shop):\n\nEncodec seems to struggle with electronic music. Even just Encoding-\u003eDecoding has many problems.\n\n4:00 - 4:30 - [Moe Shop - WONDER POP](https://youtu.be/H4PZ7mju5QQ?t=240)\n\nOriginal: https://voca.ro/1jbsor6BAyLY\n\nEncode -\u003e Decode: https://voca.ro/1kF2yyGyRn0y\n\nOverfit -\u003e Generate -\u003e Decode: https://voca.ro/1f6ru5ieejJY\n\n### Bossa Nova (Tom Jobim):\n\nSofter and less aggressive melodies seem to play best with encodec and musicgen. One of these are bossa nova, which to me sounds great:\n\n1:20 - 1:50 - [Tom Jobim - Children's Games](https://youtu.be/8KVtgzOTqDw?t=80)\n\nOriginal: https://voca.ro/1dm9QpRqa5rj (last 5 seconds are ignored)\n\nEncode -\u003e Decode: https://voca.ro/19LpwVE44si7\n\nOverfit -\u003e Generate -\u003e Decode: https://voca.ro/1hJGVdxsvBOG\n\n## Citations\n\n```\n@article{copet2023simple,\n      title={Simple and Controllable Music Generation},\n      author={Jade Copet and Felix Kreuk and Itai Gat and Tal Remez and David Kant and Gabriel Synnaeve and Yossi Adi and Alexandre Défossez},\n      year={2023},\n      journal={arXiv preprint arXiv:2306.05284},\n}\n```\n\nSpecial thanks to elyxlz (223864514326560768@discord) for helping me with the masks.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchavinlo%2Fmusicgen_trainer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchavinlo%2Fmusicgen_trainer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchavinlo%2Fmusicgen_trainer/lists"}