Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/snap-research/SF-V

This respository contains the code for SF-V: Single Forward Video Generation Model.
https://github.com/snap-research/SF-V

adversarial-training diffusion-models video

Last synced: 11 days ago
JSON representation

This respository contains the code for SF-V: Single Forward Video Generation Model.

Awesome Lists containing this project

README

        

## SF-V
Single Forward Video Generation Model

[![arXiv](https://img.shields.io/badge/arXiv-2406.04324-b31b1b)](https://arxiv.org/abs/2406.04324)
[![Project Page](https://img.shields.io/badge/Project-Website-orange)](https://snap-research.github.io/SF-V/)

This respository contains the code for the NeurIPS 2024 paper [SF-V: Single Forward Video Generation Model](https://arxiv.org/abs/2406.04324).
For more visualization results, please check our [project page](https://snap-research.github.io/SF-V/).

> **[SF-V: Single Forward Video Generation Model](https://arxiv.org/abs/2406.04324)** \
> [Zhixing Zhang](https://zhang-zx.github.io/) 1,2,
> [Yanyu Li](https://scholar.google.com/citations?user=XUj8koUAAAAJ) 1,
> [Yushu Wu](https://scholar.google.com/citations?user=3hEDsFYAAAAJ) 1,
> [Yanwu Xu](https://xuyanwu.github.io/) 1,
> [Anil Kag](https://anilkagak2.github.io/) 1,
> [Ivan Skorokhodov](https://universome.github.io/) 1,
> [Willi Menapace](https://scholar.google.com/citations?user=31ha1LgAAAAJ) 1,
> [Aliaksandr Siarohin](https://aliaksandrsiarohin.github.io/aliaksandr-siarohin-website/) 1,
> [Junli Cao](https://scholar.google.com/citations?user=BV98MGAAAAAJ) 1,
> [Dimitris Metaxas](https://people.cs.rutgers.edu/~dnm/) 2,
> [Sergey Tulyakov](http://www.stulyakov.com/) 1,
> and [Jian Ren](https://alanspike.github.io/) 1 \
> 1 Snap Inc.
> 2 Rutgers University






**TL;DR:** **SF-V** is a video generation method that can generate high-quality and motion consistent videos by only performing the sampling once during inference.
> Diffusion-based video generation models have demonstrated remarkable success in obtaining high-fidelity videos through the iterative denoising process. However, these models require multiple denoising steps during sampling, resulting in high computational costs. In this work, we propose a novel approach to obtain _single_-step video generation models by leveraging adversarial training to fine-tune pre-trained video diffusion models. We show that, through the adversarial training, the multi-steps video diffusion model, _i.e._, Stable Video Diffusion (SVD), can be trained to perform _single_ forward pass to synthesize high-quality videos, capturing both temporal and spatial dependencies in the video data. Extensive experiments demonstrate that our method achieves competitive generation quality of synthesized videos with significantly reduced computational overhead for the denoising process (_i.e._, around 23x speedup compared with SVD and 6x speedup compared with existing works, with even better generation quality), paving the way for real-time video synthesis and editing.

## Reference

If our work helps you, please consider to cite our paper. Thanks!

```BibTeX
@article{zhang2022sfv,
title={SF-V: Single Forward Video Generation Model},
author={Zhang, Zhixing and Li, Yanyu and Wu, Yushu and Xu, Yanwu and Kag, Anil and Skorokhodov, Ivan and Menapace, Willi and Siarohin, Aliaksandr and Cao, Junli and Metaxas, Dimitris and Tulyakov, Sergey and Ren, Jian},
journal={arXiv preprint arXiv:2406.04324},
year={2024}
}
```