https://github.com/scofield7419/Video-of-Thought

Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"
https://github.com/scofield7419/Video-of-Thought

chain-of-thought chain-of-thought-reasoning multimodal-large-language-models video video-model video-reasoning

Last synced: 7 months ago
JSON representation

Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"

Host: GitHub
URL: https://github.com/scofield7419/Video-of-Thought
Owner: scofield7419
License: apache-2.0
Created: 2024-05-06T03:40:03.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-06-24T05:04:13.000Z (over 1 year ago)
Last Synced: 2024-11-11T06:49:57.610Z (about 1 year ago)
Topics: chain-of-thought, chain-of-thought-reasoning, multimodal-large-language-models, video, video-model, video-reasoning
Homepage: http://haofei.vip/VoT/
Size: 1.05 MB
Stars: 40
Watchers: 4
Forks: 2
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

ai-game-devtools - Video-of-Thought - of-Thought: Step-by-Step Video Reasoning from Perception to Cognition. | | | Video | (<span id="video">Video</span> / <span id="tool">LLM (LLM & Tool)</span>)

README

## 🤔🎞️ Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition

**The implementation of the ICML 2024 paper [Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition](https://is.gd/fcfZeO)**

----------
### 🎉 Visit the project page: [VoT](http://haofei.vip/VoT/)

----------

## Overview

> The first video Chain-of-Thought reasoning framework, VoT, which decomposes raw complex problems into a chain of sub-problems, and reasons through
multiple steps from low to high levels, enabling not only pixel perceptive recognition but also semantic
cognitive understanding of videos.

> We also introduce a novel video MLLM, namely MotionEpic, which supports not only video input but also the encoding, understanding and generation of STSGs.

----------

## Code

(TBD)

----------

## Citation

If you use this work, please kindly cite:

```
@inproceedings{VoT24Hao,
author = {Hao Fei, Shengqiong Wu, Wei Ji, Hanwang Zhang, Meishan Zhang, Mong-Li Lee, Wynne Hsu},
title = {Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition},
journal = {Proceedings of the International Conference on Machine Learning (ICML)},
year = {2024},
}
```

----------
### License

The code is released under Apache License 2.0 for Noncommercial use only.

----------

### Contact

For any questions, feel free to contact [Hao Fei](mailto:haofei37@nus.edu.sg).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/scofield7419/Video-of-Thought

Awesome Lists containing this project

README