https://soczech.github.io/showhowto/
Code for the paper "ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions" published at CVPR 2025
https://soczech.github.io/showhowto/
diffusion-model diffusion-models generative-ai generative-art generative-model
Last synced: 2 months ago
JSON representation
Code for the paper "ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions" published at CVPR 2025
- Host: GitHub
- URL: https://soczech.github.io/showhowto/
- Owner: soCzech
- License: mit
- Created: 2024-11-20T20:44:26.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-03-16T15:09:36.000Z (7 months ago)
- Last Synced: 2025-03-16T16:24:19.091Z (7 months ago)
- Topics: diffusion-model, diffusion-models, generative-ai, generative-art, generative-model
- Language: Python
- Homepage: https://soczech.github.io/showhowto/
- Size: 83 KB
- Stars: 8
- Watchers: 6
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-diffusion-categorized - [Project
README
# ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions
### [[Project Website :dart:]](https://soczech.github.io/showhowto/) [[Paper :page_with_curl:]](https://arxiv.org/abs/2412.01987) [Code :octocat:]
This repository contains code for the CVPR'25 paper [ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions](https://arxiv.org/abs/2412.01987).
## Run the model on your images and prompts
1. **Environment setup**
- Use provided `Dockerfile` to build the environment or install the [packages](https://github.com/soCzech/ShowHowTo/blob/main/Dockerfile) manually.
```
docker build -t showhowto .
docker run -it --rm -v $(pwd):$(pwd) -w $(pwd) --gpus=1 showhowto:latest bash
```
- The code, as written, requires a GPU.2. **Download ShowHowTo model weights**
- Use `download_weights.sh` script or download the [ShowHowTo weights](https://data.ciirc.cvut.cz/public/projects/2024ShowHowTo/weights/) manually.3. **Get predictions**
- Run the following command to get example predictions.
```
python predict.py --ckpt_path ./weights/showhowto_2to8steps.pt
--prompt_file ./test_data/prompt_file.txt
--unconditional_guidance_scale 7.5
```
- To run the model on your images and prompts, replace `./test_data/prompt_file.txt` with your prompt file.## Training
1. **Environment setup**
- Use the same environment as for the prediction (see above).2. **Download DynamiCrafter model weights**
- Use `download_weights.sh` script or download the [DynamiCrafter weights](https://huggingface.co/Doubiiu/DynamiCrafter/blob/main/model.ckpt) manually.3. **Get the dataset**
- To replicate our experiments on the ShowHowTo dataset, see below, or use your own dataset.
- The dataset must have the following directory structure.
```
dataset_root
├── prompts.json
└── imgseqs
├── .jpg
│ ...
└── ...
```
There can be multiple directories with names starting with `imgseqs`.
- The `promts.json` file must have the following structure.
```
{
"": ["prompt for the 1st frame", "prompt for the 2nd frame", ...],
...
}
```
- The sequence image `.jpg` must be of width `N*W` (`W` is width of each image in the sequence) and arbitrary height `H`.
The number of images in the sequence `N` must match the length of the prompt list in the `prompts.json` file.
4. **Train**
- Run the training code.
```
python train.py --local_batch_size 2
--dataset_root /path/to/ShowHowToTrain
--ckpt_path weights/dynamicrafter_256_v1.ckpt
```
- We trained on a single node with 8 GPUs with the batch size of 2 videos per GPU. Be advised, that more than 40 GB of VRAM per GPU may be required to train with batch size larger than 1.## Dataset
You can download the ShowHowTo dataset using the `download_dataset.sh` script. To also download the image sequences from our servers, you need username and password.
You can obtain it by sending an email to *tomas.soucek at cvut dot cz* specifying your name and affiliation. Please use your institutional email (i.e., not gmail, etc.).You can also extract the dataset from the raw original videos with the following steps.
1. **Download the HowTo100M videos and the ShowHowTo prompts**
- The list of all video ids for both the train set and test set can be found [here](https://data.ciirc.cvut.cz/public/projects/2024ShowHowTo/dataset/).
- For each video, the `keyframes.json` file contains information on which video frames are part of the dataset.
- You can find there also the prompts for each video in `prompts.json` file.
2. **Extract the video frames of the ShowHowTo dataset**
- To extract the frames from the videos, we used ffmpeg v7.0.1 with the following function.
```python
def extract_frame(video, start_sec, frame_idx, width, height):
ffmpeg_args = ['ffmpeg', '-i', video, '-f', 'rawvideo', '-pix_fmt', 'rgb24',
'-vf', f'fps=5,select=gte(t\\,{start_sec}),select=eq(n\\,{frame_idx})',
'-s', f'{width}x{height}', '-vframes', '1', 'pipe:']
video_stream = subprocess.Popen(ffmpeg_args, stdout=subprocess.PIPE, stderr=subprocess.DEVNULL)
in_bytes = video_stream.stdout.read(width * height * 3)
return np.frombuffer(in_bytes, np.uint8).reshape([height, width, 3])
```
The function arguments are: `video` is the path to the video, `start_sec` and `frame_idx` are the values from the `keyframes.json` and `width` and `height` specify the output image size (we used the native video resolution here).
3. **Prepare the image sequences**
- Concatenate all frames from a video in the horizontal dimension and place the resulting concatenated image into `dataset_root/imgseqs/.jpg`. The `` is the YouTube video id.## Citation
```bibtex
@article{soucek2025showhowto,
title={ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions},
author={Sou\v{c}ek, Tom\'{a}\v{s} and Gatti, Prajwal and Wray, Michael and Laptev, Ivan and Damen, Dima and Sivic, Josef},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2025}
}
```## Acknowledgements
The code has been adapted from the ECCV 2024 paper [DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors](https://arxiv.org/abs/2310.12190) available on [GitHub](https://github.com/Doubiiu/DynamiCrafter). Please refer to its license before use.