https://github.com/feizc/FluxMusic

Text-to-Music Generation with Rectified Flow Transformer
https://github.com/feizc/FluxMusic

Last synced: 4 months ago
JSON representation

Text-to-Music Generation with Rectified Flow Transformer

Host: GitHub
URL: https://github.com/feizc/FluxMusic
Owner: feizc
License: other
Created: 2024-08-06T09:41:07.000Z (9 months ago)
Default Branch: main
Last Pushed: 2024-09-06T14:14:33.000Z (8 months ago)
Last Synced: 2024-09-06T16:50:26.031Z (8 months ago)
Language: Python
Size: 1.72 MB
Stars: 486
Watchers: 15
Forks: 41
Open Issues: 7
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

ai-game-devtools - FluxMusic - to-Music Generation with Rectified Flow Transformer. | [arXiv](https://arxiv.org/abs/2409.00587) | | Music | (<span id="music">Music</span> / <span id="tool">Tool (AI LLM)</span>)

README

        ## FluxMusic: Text-to-Music Generation with Rectified Flow Transformer 
_{Official PyTorch Implementation}

  

  

    

    

This repo contains PyTorch model definitions, pre-trained weights, and training/sampling code for paper *Flux that plays music*. 

It explores a simple extension of diffusion-based rectified flow Transformers for text-to-music generation. The model architecture can be seen as follows: 



### To-do list

- [x] training / inference scripts

- [x] clean code

- [x] all ckpts and part of dataset

### 1. Training 

You can refer to the [link](https://github.com/black-forest-labs/flux) to build the running environment.

To launch small version in the latent space training with `N` GPUs on one node with pytorch DDP:

```bash

torchrun --nnodes=1 --nproc_per_node=N train.py \

--version small \

--data-path xxx \

--global_batch_size 128

```

More scripts of different model size can reference to `scripts` file direction. 

### 2. Inference 

We include a [`sample.py`](sample.py) script which samples music clips according to conditions from a MusicFlux model as:  

```bash

python sample.py \

--version small \

--ckpt_path /path/to/model \

--prompt_file config/example.txt

```

All prompts used in paper are lists in `config/example.txt`.

### 3. Download Ckpts and Data 

We use VAE and Vocoder in AudioLDM2, CLAP-L, and T5-XXL. You can download in the following table directly, we also provide the training scripts in our experiments. 

Note that in actual experiments, a restart experiment was performed due to machine malfunction, so there will be resume options in some scripts.

|  Model |Training steps  |  Url | Training scripts |  

|-------|--------|------------------|---------| 

| VAE | -| [link](https://huggingface.co/cvssp/audioldm2/tree/main/vae) | - |

| Vocoder |-| [link](https://huggingface.co/cvssp/audioldm2/tree/main/vocoder) | - |

| T5-XXL | - | [link](https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers/tree/main/text_encoder_3) | - |

| CLAP-L | -|  [link](https://huggingface.co/laion/larger_clap_music/tree/main) | - |

| FluxMusic-Small |   200K     | [link](https://huggingface.co/feizhengcong/FluxMusic/blob/main/musicflow_s.pt)  |  [link](https://github.com/feizc/FluxMusic/blob/main/scripts/train_s.sh) | 

| FluxMusic-Base   |   200K    | [link](https://huggingface.co/feizhengcong/FluxMusic/blob/main/musicflow_b.pt)  | [link](https://github.com/feizc/FluxMusic/blob/main/scripts/train_b.sh) |  

| FluxMusic-Large   |  200K    | [link](https://huggingface.co/feizhengcong/FluxMusic/blob/main/musicflow_l.pt)  | [link](https://github.com/feizc/FluxMusic/blob/main/scripts/train_l.sh)  | 

| FluxMusic-Giant    |  200K   | [link](https://huggingface.co/feizhengcong/FluxMusic/blob/main/musicflow_g.pt)   | [link](https://github.com/feizc/FluxMusic/blob/main/scripts/train_g.sh) | 

| FluxMusic-Giant-Full    |  2M   | [link](https://huggingface.co/feizhengcong/FluxMusic/blob/main/musicflow_g_full.pt)   | - | 

Note that 200K-steps ckpts are trained on a sub-training set and used for ploted the scaling experiments as well as case studies in the paper. 

The full version of main results will be released right way. 

The construction of training data can refer to the `test.py` file, showing a simple build of combing differnet datasets in json file. 

Considering copyright issues, the data used in the paper needs to be downloaded by oneself.  

We provide a clean subset in:    

A quick download link for other datasets can be found in [Huggingface](https://huggingface.co/datasets?search=music) : ). 

This is a research project, and it is recommended to try advanced products: 

   

### Acknowledgments

The codebase is based on the awesome [Flux](https://github.com/black-forest-labs/flux) and [AudioLDM2](https://github.com/haoheliu/AudioLDM2) repos.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/feizc/FluxMusic

Awesome Lists containing this project

README