https://github.com/OpenGVLab/InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
https://github.com/OpenGVLab/InternVideo
action-recognition benchmark contrastive-learning foundation-models instruction-tuning masked-autoencoder multimodal open-set-recognition self-supervised spatio-temporal-action-localization temporal-action-localization video-clip video-data video-dataset video-question-answering video-retrieval video-understanding vision-transformer zero-shot-classification zero-shot-retrieval
Last synced: about 1 month ago
JSON representation
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
- Host: GitHub
- URL: https://github.com/OpenGVLab/InternVideo
- Owner: OpenGVLab
- License: apache-2.0
- Created: 2022-11-23T12:57:00.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-12-07T16:20:41.000Z (5 months ago)
- Last Synced: 2024-12-09T08:52:15.241Z (4 months ago)
- Topics: action-recognition, benchmark, contrastive-learning, foundation-models, instruction-tuning, masked-autoencoder, multimodal, open-set-recognition, self-supervised, spatio-temporal-action-localization, temporal-action-localization, video-clip, video-data, video-dataset, video-question-answering, video-retrieval, video-understanding, vision-transformer, zero-shot-classification, zero-shot-retrieval
- Language: Python
- Homepage:
- Size: 53.2 MB
- Stars: 1,452
- Watchers: 27
- Forks: 90
- Open Issues: 92
-
Metadata Files:
- Readme: README.md
- License: LICENSE