{"id":15036203,"url":"https://github.com/acids-ircam/rave","last_synced_at":"2025-05-15T00:06:42.349Z","repository":{"id":37511182,"uuid":"380178613","full_name":"acids-ircam/RAVE","owner":"acids-ircam","description":"Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder","archived":false,"fork":false,"pushed_at":"2024-07-30T00:50:25.000Z","size":9394,"stargazers_count":1465,"open_issues_count":38,"forks_count":193,"subscribers_count":45,"default_branch":"master","last_synced_at":"2025-04-03T07:08:28.936Z","etag":null,"topics":["ai","audio","deep-learning","generative-model","neural-network"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/acids-ircam.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-06-25T08:46:22.000Z","updated_at":"2025-04-02T15:19:54.000Z","dependencies_parsed_at":"2023-12-10T22:31:28.100Z","dependency_job_id":"e5f4a010-cfc9-404b-ad80-16903f00f6d4","html_url":"https://github.com/acids-ircam/RAVE","commit_stats":null,"previous_names":[],"tags_count":31,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/acids-ircam%2FRAVE","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/acids-ircam%2FRAVE/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/acids-ircam%2FRAVE/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/acids-ircam%2FRAVE/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/acids-ircam","download_url":"https://codeload.github.com/acids-ircam/RAVE/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248353514,"owners_count":21089652,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","audio","deep-learning","generative-model","neural-network"],"created_at":"2024-09-24T20:30:30.124Z","updated_at":"2025-04-11T06:26:38.222Z","avatar_url":"https://github.com/acids-ircam.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![rave_logo](docs/rave.png)\n\n# RAVE: Realtime Audio Variational autoEncoder\n\nOfficial implementation of _RAVE: A variational autoencoder for fast and high-quality neural audio synthesis_ ([article link](https://arxiv.org/abs/2111.05011)) by Antoine Caillon and Philippe Esling.\n\nIf you use RAVE as a part of a music performance or installation, be sure to cite either this repository or the article !\n\nIf you want to share / discuss / ask things about RAVE you can do so in our [discord server](https://discord.gg/dhX73sPTBb) !\n\nPlease check the FAQ before posting an issue!\n\n**RAVE VST** RAVE VST for Windows, Mac and Linux is available as beta on the [corresponding Forum IRCAM webpage](https://forum.ircam.fr/projects/detail/rave-vst/). For problems, please write an issue here or [on the Forum IRCAM discussion page](https://discussion.forum.ircam.fr/c/rave-vst/651).\n\n**Tutorials** : new tutorials are available on the Forum IRCAM webpage, and video versions are coming soon!\n- [Tutorial: Neural Synthesis in a DAW with RAVE](https://forum.ircam.fr/article/detail/neural-synthesis-in-a-daw-with-rave/)\n- [Tutorial: Neural Synthesis in Max 8 with RAVE](https://forum.ircam.fr/article/detail/tutorial-neural-synthesis-in-max-8-with-rave/)\n- [Tutorial: Training RAVE models on custom data](https://forum.ircam.fr/article/detail/training-rave-models-on-custom-data/)\n\n## Previous versions\n\nThe original implementation of the RAVE model can be restored using\n\n```bash\ngit checkout v1\n```\n\n## Installation\n\nInstall RAVE using\n\n```bash\npip install acids-rave\n```\n\n**Warning** It is strongly advised to install `torch` and `torchaudio` before `acids-rave`, so you can choose the appropriate version of torch on the [library website](http://www.pytorch.org). For future compatibility with new devices (and modern Python environments), `rave-acids` does not enforce torch==1.13 anymore.\n\nYou will need **ffmpeg** on your computer. You can install it locally inside your virtual environment using\n\n```bash\nconda install ffmpeg\n```\n\n\u003c!-- Detailed instructions to setup a training station for this project are available [here](docs/training_setup.md). --\u003e\n\n## Colab\n\nA colab to train RAVEv2 is now available thanks to [hexorcismos](https://github.com/moiseshorta) !\n[![colab_badge](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1ih-gv1iHEZNuGhHPvCHrleLNXvooQMvI?usp=sharing)\n\n## Usage\n\nTraining a RAVE model usually involves 3 separate steps, namely _dataset preparation_, _training_ and _export_.\n\n### Dataset preparation\n\nYou can know prepare a dataset using two methods: regular and lazy. Lazy preprocessing allows RAVE to be trained directly on the raw files (i.e. mp3, ogg), without converting them first. **Warning**: lazy dataset loading will increase your CPU load by a large margin during training, especially on Windows. This can however be useful when training on large audio corpus which would not fit on a hard drive when uncompressed. In any case, prepare your dataset using\n\n```bash\nrave preprocess --input_path /audio/folder --output_path /dataset/path --channels X (--lazy)\n```\n\n### Training\n\nRAVEv2 has many different configurations. The improved version of the v1 is called `v2`, and can therefore be trained with\n\n```bash\nrave train --config v2 --db_path /dataset/path --out_path /model/out --name give_a_name --channels X\n```\n\nWe also provide a discrete configuration, similar to SoundStream or EnCodec\n\n```bash\nrave train --config discrete ...\n```\n\nBy default, RAVE is built with non-causal convolutions. If you want to make the model causal (hence lowering the overall latency of the model), you can use the causal mode\n\n```bash\nrave train --config discrete --config causal ...\n```\n\nNew in 2.3, data augmentations are also available to improve the model's generalization in low data regimes. You can add data augmentation by adding augmentation configuration files with the `--augment` keyword\n\n```bash\nrave train --config v2 --augment mute --augment compress\n```\n\nMany other configuration files are available in `rave/configs` and can be combined. Here is a list of all the available configurations \u0026 augmentations :\n\n\u003ctable\u003e\n\u003cthead\u003e\n\u003ctr\u003e\n\u003cth\u003eType\u003c/th\u003e\n\u003cth\u003eName\u003c/th\u003e\n\u003cth\u003eDescription\u003c/th\u003e\n\u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n\n\u003ctr\u003e\n\u003ctd rowspan=8\u003eArchitecture\u003c/td\u003e\n\u003ctd\u003ev1\u003c/td\u003e\n\u003ctd\u003eOriginal continuous model (minimum GPU memory : 8Go)\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n\u003ctd\u003ev2\u003c/td\u003e\n\u003ctd\u003eImproved continuous model (faster, higher quality) (minimum GPU memory : 16Go)\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n\u003ctd\u003ev2_small\u003c/td\u003e\n\u003ctd\u003ev2 with a smaller receptive field, adpated adversarial training, and noise generator, adapted for timbre transfer for stationary signals (minimum GPU memory : 8Go)\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n\u003ctd\u003ev2_nopqmf\u003c/td\u003e\n\u003ctd\u003e(experimental) v2 without pqmf in generator (more efficient for bending purposes) (minimum GPU memory : 16Go)\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n\u003ctd\u003ev3\u003c/td\u003e\n\u003ctd\u003ev2 with Snake activation, descript discriminator and Adaptive Instance Normalization for real style transfer (minimum GPU memory : 32Go)\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n\u003ctd\u003ediscrete\u003c/td\u003e\n\u003ctd\u003eDiscrete model (similar to SoundStream or EnCodec) (minimum GPU memory : 18Go)\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n\u003ctd\u003eonnx\u003c/td\u003e\n\u003ctd\u003eNoiseless v1 configuration for onnx usage (minimum GPU memory : 6Go)\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n\u003ctd\u003eraspberry\u003c/td\u003e\n\u003ctd\u003eLightweight configuration compatible with realtime RaspberryPi 4 inference (minimum GPU memory : 5Go)\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n\u003ctd rowspan=3\u003eRegularization (v2 only)\u003c/td\u003e\n\u003ctd\u003edefault\u003c/td\u003e\n\u003ctd\u003eVariational Auto Encoder objective (ELBO)\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n\u003ctd\u003ewasserstein\u003c/td\u003e\n\u003ctd\u003eWasserstein Auto Encoder objective (MMD)\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n\u003ctd\u003espherical\u003c/td\u003e\n\u003ctd\u003eSpherical Auto Encoder objective\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n\u003ctd rowspan=1\u003eDiscriminator\u003c/td\u003e\n\u003ctd\u003espectral_discriminator\u003c/td\u003e\n\u003ctd\u003eUse the MultiScale discriminator from EnCodec.\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n\u003ctd rowspan=3\u003eOthers\u003c/td\u003e\n\u003ctd\u003ecausal\u003c/td\u003e\n\u003ctd\u003eUse causal convolutions\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n\u003ctd\u003enoise\u003c/td\u003e\n\u003ctd\u003eEnables noise synthesizer V2\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n\u003ctd\u003ehybrid\u003c/td\u003e\n\u003ctd\u003eEnable mel-spectrogram input\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n\u003ctd rowspan=3\u003eAugmentations\u003c/td\u003e\n\u003ctd\u003emute\u003c/td\u003e\n\u003ctd\u003eRandomly mutes data batches (default prob : 0.1). Enforces the model to learn silence\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n\u003ctd\u003ecompress\u003c/td\u003e\n\u003ctd\u003eRandomly compresses the waveform (equivalent to light non-linear amplification of batches)\u003c/td\u003e\n\u003c/tr\u003e\n\n\u003ctr\u003e\n\u003ctd\u003egain\u003c/td\u003e\n\u003ctd\u003eApplies a random gain to waveform (default range : [-6, 3]) \u003c/td\u003e\n\u003c/tr\u003e\n\n\u003c/tbody\u003e\n\u003c/table\u003e\n\n### Export\n\nOnce trained, export your model to a torchscript file using\n\n```bash\nrave export --run /path/to/your/run (--streaming)\n```\n\nSetting the `--streaming` flag will enable cached convolutions, making the model compatible with realtime processing. **If you forget to use the streaming mode and try to load the model in Max, you will hear clicking artifacts.**\n\n## Prior\n\nFor discrete models, we redirect the user to the `msprior` library [here](https://github.com/caillonantoine/msprior). However, as this library is still experimental, the prior from version 1.x has been re-integrated in v2.3.\n\n### Training\n\nTo train a prior for a pretrained RAVE model :\n\n```bash\nrave train_prior --model /path/to/your/run --db_path /path/to/your_preprocessed_data --out_path /path/to/output\n```\n\nthis will train a prior over the latent of the pretrained model `path/to/your/run`, and save the model and tensorboard logs to folder `/path/to/output`.\n\n### Scripting\n\nTo script a prior along with a RAVE model, export your model by providing the `--prior` keyword to your pretrained prior :\n\n```bash\nrave export --run /path/to/your/run --prior /path/to/your/prior (--streaming)\n```\n\n## Pretrained models\n\nSeveral pretrained streaming models [are available here](https://acids-ircam.github.io/rave_models_download). We'll keep the list updated with new models.\n\n## Realtime usage\n\nThis section presents how RAVE can be loaded inside [`nn~`](https://acids-ircam.github.io/nn_tilde/) in order to be used live with Max/MSP or PureData.\n\n### Reconstruction\n\nA pretrained RAVE model named `darbouka.gin` available on your computer can be loaded inside `nn~` using the following syntax, where the default method is set to forward (i.e. encode then decode)\n\n\u003cimg src=\"docs/rave_method_forward.png\" width=400px/\u003e\n\nThis does the same thing as the following patch, but slightly faster.\n\n\u003cimg src=\"docs/rave_encode_decode.png\" width=210px /\u003e\n\n### High-level manipulation\n\nHaving an explicit access to the latent representation yielded by RAVE allows us to interact with the representation using Max/MSP or PureData signal processing tools:\n\n\u003cimg src=\"docs/rave_high_level.png\" width=310px /\u003e\n\n### Style transfer\n\nBy default, RAVE can be used as a style transfer tool, based on the large compression ratio of the model. We recently added a technique inspired from StyleGAN to include Adaptive Instance Normalization to the reconstruction process, effectively allowing to define _source_ and _target_ styles directly inside Max/MSP or PureData, using the attribute system of `nn~`.\n\n\u003cimg src=\"docs/rave_attribute.png\" width=550px\u003e\n\nOther attributes, such as `enable` or `gpu` can enable/disable computation, or use the gpu to speed up things (still experimental).\n\n## Offline usage\n\nA batch generation script has been released in v2.3 to allow transformation of large amount of files\n\n```bash\nrave generate model_path path_1 path_2 --out out_path\n```\n\nwhere `model_path` is the path to your trained model (original or scripted), `path_X` a list of audio files or directories, and `out_path` the out directory of the generations.\n\n## Discussion\n\nIf you have questions, want to share your experience with RAVE or share musical pieces done with the model, you can use the [Discussion tab](https://github.com/acids-ircam/RAVE/discussions) !\n\n## Demonstration\n\n### RAVE x nn~\n\nDemonstration of what you can do with RAVE and the nn~ external for maxmsp !\n\n[![RAVE x nn~](http://img.youtube.com/vi/dMZs04TzxUI/mqdefault.jpg)](https://www.youtube.com/watch?v=dMZs04TzxUI)\n\n### embedded RAVE\n\nUsing nn~ for puredata, RAVE can be used in realtime on embedded platforms !\n\n[![RAVE x nn~](http://img.youtube.com/vi/jAIRf4nGgYI/mqdefault.jpg)](https://www.youtube.com/watch?v=jAIRf4nGgYI)\n\n# Frequently Asked Question (FAQ)\n\n**Question** : my preprocessing is stuck, showing `0it[00:00, ?it/s]`\u003cbr/\u003e\n**Answer** : This means that the audio files in your dataset are too short to provide a sufficient temporal scope to RAVE. Try decreasing the signal window with the `--num_signal XXX(samples)` with `preprocess`, without forgetting afterwards to add the `--n_signal XXX(samples)` with `train`\n\n**Question** : During training I got an exception resembling `ValueError: n_components=128 must be between 0 and min(n_samples, n_features)=64 with svd_solver='full'`\u003cbr/\u003e\n**Answer** : This means that your dataset does not have enough data batches to compute the intern latent PCA, that requires at least 128 examples (then batches). \n\n\n# Funding\n\nThis work is led at IRCAM, and has been funded by the following projects\n\n- [ANR MakiMono](https://acids.ircam.fr/course/makimono/)\n- [ACTOR](https://www.actorproject.org/)\n- [DAFNE+](https://dafneplus.eu/) N° 101061548\n\n\u003cimg src=\"https://ec.europa.eu/regional_policy/images/information-sources/logo-download-center/eu_co_funded_en.jpg\" width=200px/\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Facids-ircam%2Frave","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Facids-ircam%2Frave","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Facids-ircam%2Frave/lists"}