https://github.com/chavinlo/riffusion-manipulation

tools to manipulate audio with riffusion
https://github.com/chavinlo/riffusion-manipulation

ai diffusers diffusion generative-music music riffusion stable-diffusion

Last synced: 3 months ago
JSON representation

tools to manipulate audio with riffusion

Host: GitHub
URL: https://github.com/chavinlo/riffusion-manipulation
Owner: chavinlo
Created: 2022-12-16T17:27:11.000Z (almost 3 years ago)
Default Branch: master
Last Pushed: 2023-11-13T05:14:59.000Z (almost 2 years ago)
Last Synced: 2024-08-08T00:43:26.642Z (about 1 year ago)
Topics: ai, diffusers, diffusion, generative-music, music, riffusion, stable-diffusion
Language: Python
Homepage:
Size: 22.8 MB
Stars: 85
Watchers: 3
Forks: 13
Open Issues: 5
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Riffusion Manipulation Tools

# Usage

## Flags
The following arguments/flags are available on all convertors:

`-i / --input INPUTFILE.ext`

---

On img2audio.py:
`-o / --output OUTPUTFILE.ext`

On file2img.py:
`-o / --output OUTPUT_FOLDER`

---

`-m / --maxvol [integer]` : Maximun volume, 50+ Okay quality, 100+ Good Quality, 255+ Max Quality

`-p / --powerforimage [float]` : Amount of power to use. 0.25-0.35 recommended. Too low will create loud noise, too high will create silent static

`-n / --nmels [integer]` : n_mels to use. Must match the ones on the image. Basically the HEIGHT. 512 is the default used by the webUI, the higher it is the less compression is used and higher quality. Maximun is somewhere 1280.

---

On file2img.py
`-d / --duration` : Duration of each chunk in ms. 1 Second = 1000ms. Determines how wide the image will be. 5119 to get 512 width image.

## Convert Audio to Image
To convert an audio into an image use file2img.py:

`python3 file2img.py -i INPUT_AUDIO.wav -o OUTPUT_FOLDER`

Note that, this will generate a folder with all the output spectogram images, each separated by 5119 ms (5.12 seconds).

For example, to convert charmpoint.mp3 [(Credits:
Snail's House - Charm Point)](https://www.youtube.com/watch?v=NNvptCE6_Ds)

`python3 file2img.py -i charmpoint.wav -o charmpoint_images`

Will generate a folder with the spectograms of the entire song. Each spectogram correspond to 5119 ms chunks, unless you set the duration flag to one of your choice. Heres one of them:

Spectogram of Charm Point

This image can be used as a seed on the riffusion webUI.

Additionally, if the audio does not end in a multiple of the duration, it will grab the remaining ms and add silence to it:

Spectogram of Charm Point

## Verify / Convert Image to Audio
It is highly recommended to verify that the audio has been correctly converted. You can do soo by using img2audio.py:

`python3 img2audio.py -i INPUT_IMAGE.ext -o OUTPUT_AUDIO.ext`

`python3 img2audio.py -i charmpoint_chunks/charmpoint_43.png -o charmpoint_chunk_43.mp3`

This audio is also included in the repository.

# More info
The result images are in 1 channel, Black and White. In order for these to be accepted by Stable Diffusion tools, you need to convert them into RGB. The Riffusion inference server also does this. Most webUIs do this by default.

# Experiments

Experiments on variables are available at tests/

Currently the only experiment available is one done with [Planet Girl from ALIEN POP](https://youtu.be/EzSC4PFnYLY?t=19) at 0:19, clip.wav being the original audio, and configurations used being available on the folder names.

# Support
You can join the riffuser discord: https://discord.gg/HWdanyzvRt

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/chavinlo/riffusion-manipulation

Awesome Lists containing this project

README