Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/scofield7419/Video-of-Thought
Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"
https://github.com/scofield7419/Video-of-Thought
chain-of-thought chain-of-thought-reasoning multimodal-large-language-models video video-model video-reasoning
Last synced: 29 days ago
JSON representation
Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"
- Host: GitHub
- URL: https://github.com/scofield7419/Video-of-Thought
- Owner: scofield7419
- License: apache-2.0
- Created: 2024-05-06T03:40:03.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-06-24T05:04:13.000Z (6 months ago)
- Last Synced: 2024-11-11T06:49:57.610Z (about 1 month ago)
- Topics: chain-of-thought, chain-of-thought-reasoning, multimodal-large-language-models, video, video-model, video-reasoning
- Homepage: http://haofei.vip/VoT/
- Size: 1.05 MB
- Stars: 40
- Watchers: 4
- Forks: 2
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- ai-game-devtools - Video-of-Thought - of-Thought: Step-by-Step Video Reasoning from Perception to Cognition. | | | Video | (<span id="video">Video</span> / <span id="tool">Tool (AI LLM)</span>)
README
## 🤔🎞️ Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition
**The implementation of the ICML 2024 paper [Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition](https://is.gd/fcfZeO)**
----------
### 🎉 Visit the project page: [VoT](http://haofei.vip/VoT/)----------
> The first video Chain-of-Thought reasoning framework, VoT, which decomposes raw complex problems into a chain of sub-problems, and reasons through
multiple steps from low to high levels, enabling not only pixel perceptive recognition but also semantic
cognitive understanding of videos.
> We also introduce a novel video MLLM, namely MotionEpic, which supports not only video input but also the encoding, understanding and generation of STSGs.
----------
## Code
(TBD)
----------
## Citation
If you use this work, please kindly cite:
```
@inproceedings{VoT24Hao,
author = {Hao Fei, Shengqiong Wu, Wei Ji, Hanwang Zhang, Meishan Zhang, Mong-Li Lee, Wynne Hsu},
title = {Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition},
journal = {Proceedings of the International Conference on Machine Learning (ICML)},
year = {2024},
}
```----------
### LicenseThe code is released under Apache License 2.0 for Noncommercial use only.
----------
### Contact
For any questions, feel free to contact [Hao Fei](mailto:[email protected]).