https://github.com/dmarx/video-killed-the-radio-star

Notebook and tools for end-to-end automation of music video production with generative AI
https://github.com/dmarx/video-killed-the-radio-star

Last synced: about 2 months ago
JSON representation

Notebook and tools for end-to-end automation of music video production with generative AI

Host: GitHub
URL: https://github.com/dmarx/video-killed-the-radio-star
Owner: dmarx
License: mit
Created: 2022-09-23T15:57:47.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2023-06-23T01:01:52.000Z (about 2 years ago)
Last Synced: 2024-11-06T00:29:50.066Z (8 months ago)
Language: Jupyter Notebook
Homepage: https://colab.research.google.com/github/dmarx/video-killed-the-radio-star/blob/main/Video_Killed_The_Radio_Star_Defusion.ipynb#scrollTo=oPbeyWtesAoh
Size: 147 MB
Stars: 198
Watchers: 9
Forks: 35
Open Issues: 28
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-generative-ai - 🔥 - to-end automation of music video production with generative AI (Image Segmentation / Creative Uses of Generative AI Image Synthesis Tools)

README

# Video Killed The Radio Star [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dmarx/video-killed-the-radio-star/blob/main/Video_Killed_The_Radio_Star_Defusion.ipynb)

## Requirements

* ffmpeg - https://ffmpeg.org/
* pytorch - https://pytorch.org/get-started/locally/
* vktrs - (this repo) - `pip install vktrs[api]`
* stability_sdk api token - https://beta.dreamstudio.ai/ > circular icon in top right > membership > API Key
* whisper - `pip install git+https://github.com/openai/whisper`

## FAQ

**What is this?**

TLDR: Automated music video maker, given an mp3 or a youtube URL

**How does this animation technique work?**

For each text prompt you provide, the notebook will...

1. Generate an image based on that text prompt (using stable diffusion)
2. Use the generated image as the `init_image` to recombine with the text prompt to generate variations similar to the first image. This produces a sequence of extremely similar images based on the original text prompt
3. Images are then intelligently reordered to find the smoothest animation sequence of those frames
3. This image sequence is then repeated to pad out the animation duration as needed

The technique demonstrated in this notebook was inspired by a [video](https://www.youtube.com/watch?v=WJaxFbdjm8c) created by Ben Gillin.

**How are lyrics transcribed?**

This notebook uses openai's recently released 'whisper' model for performing automatic speech recognition.
OpenAI was kind of to offer several different sizes of this model which each have their own pros and cons.
This notebook uses the largest whisper model for transcribing the actual lyrics. Additionally, we use the
smallest model for performing the lyric segmentation. Neither of these models is perfect, but the results
so far seem pretty decent.

The first draft of this notebook relied on subtitles from youtube videos to determine timing, which was
then aligned with user-provided lyrics. Youtube's automated captions are powerful and I'll update the
notebook shortly to leverage those again, but for the time being we're just using whisper for everything
and not referencing user-provided captions at all.

**Something didn't work quite right in the transcription process. How do fix the timing or the actual lyrics?**

The notebook is divided into several steps. Between each step, a "storyboard" file is updated. If you want to
make modifications, you can edit this file directly and those edits should be reflected when you next load the
file. Depending on what you changed and what step you run next, your changes may be ignored or even overwritten.
Still playing with different solutions here.

**Can I provide my own images to 'bring to life' and associate with certain lyrics/sequences?**

Yes, you can! As described above: you just need to modify the storyboard. Will describe this functionality in
greater detail after the implementation stabilizes a bit more.

**This gave me an idea and I'd like to use just a part of your process here. What's the best way to reuse just some of the machinery you've developed here?**

Most of the functionality in this notebook has been offloaded to library I published to pypi called `vktrs`. I strongly encourage you to import anything you need
from there rather than cutting and pasting function into a notebook. Similarly, if you have ideas for improvements, please don't hesitate to submit a PR!

## Dev notes

```
!pip install --upgrade setuptools build
!git clone https://github.com/dmarx/video-killed-the-radio-star/
!cd video-killed-the-radio-star; python -m build; python -m pip install -e .[api,hf]
!pip install ipykernel ipywidgets panel prefetch_generator
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dmarx/video-killed-the-radio-star

Awesome Lists containing this project

README