https://github.com/SCZwangxiao/video-ReTaKe

Last synced: 5 months ago
JSON representation

Host: GitHub
URL: https://github.com/SCZwangxiao/video-ReTaKe
Owner: SCZwangxiao
License: mit
Created: 2024-12-27T02:16:20.000Z (6 months ago)
Default Branch: main
Last Pushed: 2025-01-06T09:18:11.000Z (6 months ago)
Last Synced: 2025-01-06T10:29:33.143Z (6 months ago)
Language: Python
Size: 48.8 MB
Stars: 11
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-token-merge-for-mllms - [Code
awesome-token-merge-for-mllms - [Code

README

ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding

## Contributions

- The training-free RETAKE is the first to jointly model temporal and knowledge redundancy for long video understanding, which enables 4x longer video sequence with less than 1% performance drop.

- We propose a novel keyframe selection method DPSelect to reduce low-level temporal redundancy, and a novel KV cache compression method PivotKV to reduce high-level knowledge redundancy in long videos.

## Prepare environment
```bash
# For GPU users
conda env create -f environment.yaml

# For NPU users
# conda env create -f environment_npu.yaml

apt-get install ffmpeg # NOTE: Quick demo does not require ffmpeg
```

## Quick demo
- Step 1:
Change `hf_qwen2vl7b_path` in `./demo.py` into your path to `Qwen2-VL-7B-Instruct`. Note that for NPU users, you need also change `config_path` into `'configs/retake_demo_npu.yaml'`.

- Step 2:
Run demo
```bash
python demo.py
```

## Reproduce ReTaKe
- Prepare the datasets following the docs.
- Prepare [VideoMME](docs/prepare_videomme.md)
- Prepare [MLVU](docs/prepare_mlvu.md)
- Prepare [LVBench](docs/prepare_lvbench.md)
- Run script
```bash
bash scripts/infer_eval_retake.sh ${YOUR_PATH_TO_Qwen2-VL-7B-Instruct} configs/retake_videomme.yaml 8
bash scripts/infer_eval_retake.sh ${YOUR_PATH_TO_Qwen2-VL-7B-Instruct} configs/retake_mlvu.yaml 8
bash scripts/infer_eval_retake.sh ${YOUR_PATH_TO_Qwen2-VL-7B-Instruct} configs/retake_lvbench.yaml 8
```
The above script perform inference and evaluation all in one. Results can be found in `./results`

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/SCZwangxiao/video-ReTaKe

Awesome Lists containing this project

README