Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/TonyLianLong/LLM-groundedVideoDiffusion
LLM-grounded Video Diffusion Models (LVD): official implementation for the LVD paper
https://github.com/TonyLianLong/LLM-groundedVideoDiffusion
diffusion diffusion-models large-language-models llm text-to-image text-to-video text-to-video-generation video-generation
Last synced: 4 months ago
JSON representation
LLM-grounded Video Diffusion Models (LVD): official implementation for the LVD paper
- Host: GitHub
- URL: https://github.com/TonyLianLong/LLM-groundedVideoDiffusion
- Owner: TonyLianLong
- Created: 2023-10-02T17:15:55.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2023-10-02T17:30:34.000Z (9 months ago)
- Last Synced: 2023-10-02T23:32:56.056Z (9 months ago)
- Topics: diffusion, diffusion-models, large-language-models, llm, text-to-image, text-to-video, text-to-video-generation, video-generation
- Homepage: https://llm-grounded-video-diffusion.github.io/
- Size: 1000 Bytes
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Lists
- awesome-diffusion-categorized - [Code
README
# LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models
[Long Lian](https://tonylian.com/), [Baifeng Shi](https://bfshi.github.io/), [Adam Yala](https://www.adamyala.org/), [Trevor Darrell](https://people.eecs.berkeley.edu/~trevor/), [Boyi Li](https://sites.google.com/site/boyilics/home) at UC Berkeley/UCSF.[Paper](https://arxiv.org/abs/2309.17444) | [Project Page](https://llm-grounded-video-diffusion.github.io/) | [HuggingFace Demo (coming soon)](#) | [Related Project: LMD](https://llm-grounded-diffusion.github.io/) | [Citation](#citation)
![Comparisons with our baseline](https://llm-grounded-video-diffusion.github.io/teaser.jpg)
![Method Figure](https://llm-grounded-video-diffusion.github.io/overall_method.jpg)
Our DSL-grounded Video Generator:
![DSL-grounded Video Generator](https://llm-grounded-video-diffusion.github.io/dsl_to_video.jpg)
LLM generates dynamic scene layouts, taking the world properties (e.g., gravity, elasticity, air friction) into account:
![](https://llm-grounded-video-diffusion.github.io/world_properties.jpg)
LLM generates dynamic scene layouts, taking the camera properties (e.g., perspective projection) into account:
![](https://llm-grounded-video-diffusion.github.io/camera_properties.jpg)
We propose a benchmark of five tasks. Our method improves on all five tasks without specifically aiming for each one:
![](https://llm-grounded-video-diffusion.github.io/visualizations_small.jpg)
## Code
The code is coming soon! Meanwhile, give this repo a star to support us!## Contact us
Please contact Long (Tony) Lian if you have any questions: `[email protected]`.## Citation
If you use our work or our implementation in this repo, or find them helpful, please consider giving a citation.
```
@article{lian2023llmgroundedvideo,
title={LLM-grounded Video Diffusion Models},
author={Lian, Long and Shi, Baifeng and Yala, Adam and Darrell, Trevor and Li, Boyi},
journal={arXiv preprint arXiv:2309.17444},
year={2023},
}@article{lian2023llmgrounded,
title={LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models},
author={Lian, Long and Li, Boyi and Yala, Adam and Darrell, Trevor},
journal={arXiv preprint arXiv:2305.13655},
year={2023}
}
```