{"id":30762869,"url":"https://github.com/interactiveaudiolab/tria","last_synced_at":"2026-02-11T08:08:28.404Z","repository":{"id":301958996,"uuid":"1007883921","full_name":"interactiveaudiolab/tria","owner":"interactiveaudiolab","description":null,"archived":false,"fork":false,"pushed_at":"2025-11-11T23:37:31.000Z","size":950,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-11-12T01:18:13.951Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/interactiveaudiolab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-24T17:21:10.000Z","updated_at":"2025-11-11T23:37:34.000Z","dependencies_parsed_at":"2025-06-29T20:35:50.462Z","dependency_job_id":"1aa3c66a-ed60-4196-9a7e-fd1dbf5be997","html_url":"https://github.com/interactiveaudiolab/tria","commit_stats":null,"previous_names":["interactiveaudiolab/tria"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/interactiveaudiolab/tria","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/interactiveaudiolab%2Ftria","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/interactiveaudiolab%2Ftria/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/interactiveaudiolab%2Ftria/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/interactiveaudiolab%2Ftria/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/interactiveaudiolab","download_url":"https://codeload.github.com/interactiveaudiolab/tria/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/interactiveaudiolab%2Ftria/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29329645,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-11T06:13:03.264Z","status":"ssl_error","status_checked_at":"2026-02-11T06:12:55.843Z","response_time":97,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-04T15:50:00.430Z","updated_at":"2026-02-11T08:08:28.400Z","avatar_url":"https://github.com/interactiveaudiolab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv style=\"text-align: center;\"\u003e\n  \u003cimg src=\"assets/img/tria_logo.svg\" alt=\"TRIA logo\" style=\"width: 50%;\"\u003e\n\u003c/div\u003e\n\n# The Rhythm In Anything: Audio-Prompted Drums Generation with Masked Language Modeling \n\nThis repository contains training and inference code for the TRIA \"anything-to-drums\" system proposed in the paper **The Rhythm In Anything: Audio-Prompted Drums Generation with Masked Language Modeling**.\n\n![](https://static.arxiv.org/static/browse/0.3.4/images/icons/favicon-16x16.png) [arXiv Paper: The Rhythm In Anything: Audio-Prompted Drums Generation with Masked Language Modeling\n](https://arxiv.org/abs/2509.15625) \u003cbr\u003e\n📈 [Demo Site](https://therhythminanything.github.io)\u003cbr\u003e\n⚙ [Model Weights](pretrained/tria/)\n\n## Installation\n\nClone the repo:\n\n```\ngit clone https://github.com/interactiveaudiolab/tria\ncd tria\npip install -r requirements.txt\n```\n\n\nGrant permissions:\n```\nchmod -R u+x scripts\n```\n\n## Inference\n\nLaunch the [Gradio](https://www.gradio.app/) interface:\n\n```\npython app.py\n```\n\n\u003cspan style=\"color:red\"\u003eMore models and configurations coming soon!\u003c/span\u003e\n\n\n## Training\n\n### Download Datasets\n\n__Base Configuration (`26G`)__: the TRIA models discussed in our [paper](https://arxiv.org/abs/2509.15625) were trained on a subset of the [MusDB-HQ](https://sigsep.github.io/datasets/musdb.html) dataset, totalling roughly 8 hours of drum data. To download this data, run:\n\n```\n./scripts/download/download_data.sh \u003cDATA_DIR\u003e\npython scripts/setup/create_manifests.py\n```\n\nwhere `\u003cDATA_DIR\u003e` is the directory in which you want to store data. At this point, you should be ready to [train](#single-gpu-training) TRIA from scratch!\n\n__Additional Augmentations (`88G`)__: to enable additional noise and reverb augmentations on source audio for robust rhythm feature extraction, you can download room impulse response and background noise data:\n\n```\n./scripts/download/download_extra_augs.sh\npython scripts/setup/create_extra_aug_manifests.py\n```\n\n__Additional High-Quality Drum Data (`190G`)__: to obtain additional high-quality isolated drum data, you can download the [MoisesDB](https://music.ai/research/#datasets) dataset via the Moises.ai website; you will be prompted to fill out a form to access the dataset. Once you have downloaded the dataset and extracted it to your `\u003cDATA_DIR\u003e`, run:\n\n```\npython scripts/setup/consolidate_moises.py\npython scripts/setup/create_moises_manifests.py\n```\n\n__Additional Drum Loops (`11G`)__: to obtain additional drum loops and improve the timbral diversity of training data, you can download the [FreeSound Loop Dataset](https://arxiv.org/abs/2008.11507). Filtering to remove short (\u003c4s) and non-drum recordings results in a dataset of roughly 1800 loops spanning 7 hours. To download and prepare the dataset, run:\n\n```\n./scripts/download/download_loops.py.sh\npython scripts/setup/create_loops_manifests.py\n```\n\n__Large-Scale Low-Quality Drum Data__: another way to scale drum data is to run a pre-trained source separation model on a large corpus of musical mixtures such as the [MTG-Jamendo](https://mtg.github.io/mtg-jamendo-dataset/) dataset (`152G`). In our experiments, training on [HDEMUCS](https://docs.pytorch.org/audio/stable/tutorials/hybrid_demucs_tutorial.html)-separated drum stems resulted in low-quality generations due to the prevalence of separation artifacts. However, it may still be possible to leverage such noisy data data by using it to train only \"early\" generation steps (e.g. coarse RVQ codebooks for masked language modeling).\n\n\n### Configuration\n\n\nWe provide configuration files for the five TRIA variants evaluated in our paper in the `conf/` directory, with `small_2b_musdb.yml` corresponding to the \"main\" TRIA system.\n\nWe use [`argbind`](https://github.com/pseeth/argbind) for training configuration. Once you've downloaded data and created manifests, training/validation datasets can be modified by providing paths in the relevant portions of the config file:\n\n```\ntrain/StemDataset.sources:\n  - manifests/moisesdb/train.csv\n\nval/StemDataset.sources:\n  - manifests/moisesdb/val.csv\n```\n\nas can noise and impulse response sources for data augmentation:\n\n```\ntrain/build_transform.names: [\n  ...\n  \"Reverb\",\n  \"BackgroundNoise\",\n]\n\n...\n\nReverb.drr: [uniform, 0.0, 30.0]\nReverb.sources:\n  - manifests/rir_real/train.csv\n\nBackgroundNoise.snr: [uniform, 10.0, 30.0]\nBackgroundNoise.sources:\n  - manifests/noise_room/train.csv\n```\n\n\n\n### Single-GPU Training\n\nOne you have downloaded your chosen datasets, you can train on a single GPU with:\n\n```\nexport CUDA_VISIBLE_DEVICES=0\npython scripts/train.py --args.load conf/small_2b_musdb.yml\n```\n\n### Multi-GPU Training\n\nYou can train on multiple GPUs (e.g. 2) with:\n\n```\nexport CUDA_VISIBLE_DEVICES=0,1\ntorchrun --nproc_per_node gpu scripts/train.py --args.load conf/small_2b_musdb.yml\n```\n\n### Distillation\n\nWe provide a [script](scripts/distill.py) (and corresponding [example configuration file](conf/distill_tiny_musdb_moises_fsl_2b.yml)) to distill TRIA into a smaller model:\n```\ntorchrun --nproc_per_node gpu scripts/distill.py --args.load conf/distill_tiny_musdb_moises_fsl_2b.yml\n```\n\n## Licenses\n\nThe training and inference code in this repository are licensed under the [MIT License](LICENSE). The pretrained model weights are obtained from data licensed under [Creative Commons Attribution-NonCommercial-ShareAlike (CC BY-NC-SA)](https://creativecommons.org/licenses/by-nc-sa/4.0/) and are therefore released under the same license.\n\n\n## Model Versions\n\nThis repository is an open-source reimplementation of the TRIA system described in [our paper](https://arxiv.org/abs/2509.15625), and as a result models trained using this repository may differ from those presented in the paper and supplementary materials. During the re-implementation process, we found that minor differences in random seeding, data augmentation, and dataset split can affect model performance in the small-data regime explored in the paper. Anecdotally, we find that __scaling training data reliably improves performance, with models exhibiting much stronger timbre adherence and reduced sensitivity to inference parameter configurations__. \n\nTherefore:\n* If you want a TRIA model trained on licensed, publicly available data (i.e. MusDB, MoisesDB, and FreeSound Loops), we recommend using the [default configuration](conf/small_musdb_moises_fsl_2b.yml)\n* If you want to explore the settings discussed in the TRIA paper, we provide [matching configurations](conf/exp/)\n* If you have access to large-scale high-quality licensed drum data, we recommend re-training TRIA on that data. \n\n\n## 📝 To-Do:\n* Add configs/weights for ablations and offload weights from repo\n* Both the `Reverb` and `BackgroundNoise` transforms are slow due to inefficient file reads and salient excerpting\n* Add support for additional discrete and continuous tokenizers; currently, only [DAC](https://github.com/descriptinc/descript-audio-codec) is supported, as the code and weights are MIT-licensed\n* Switch rhythm features from perceptual to RMS loudness normalization to match original TRIA\n* Allow training on variable feature sparsity / quantization, akin to [Sketch2Sound](https://arxiv.org/abs/2412.08550), to allow for inference-time control over conditioning granularity\n* Additional learning rate schedules (currently using DAC exponential decay schedule)\n\n## Citation\n\n```\n@inproceedings{tria2025,\n    author = {Patrick O'Reilly and Julia Barnett and Hugo Flores Garcia and Annie Chu and Nathan Pruyne and Prem Seetharaman and Bryan Pardo},\n    title = {The Rhythm In Anything: Audio-Prompted Drums Generation with Masked Language Modeling},\n    booktitle = {International Society for Music Information Retrieval Conference (ISMIR)},\n    year = {2025},\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finteractiveaudiolab%2Ftria","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finteractiveaudiolab%2Ftria","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finteractiveaudiolab%2Ftria/lists"}