Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with vision-language-pretraining
A curated list of projects in awesome lists tagged with vision-language-pretraining .
https://github.com/salesforce/lavis
LAVIS - A One-stop Library for Language-Vision Intelligence
deep-learning deep-learning-library image-captioning multimodal-datasets multimodal-deep-learning salesforce vision-and-language vision-framework vision-language-pretraining vision-language-transformer visual-question-anwsering
Last synced: 20 Dec 2024
https://github.com/salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
deep-learning deep-learning-library image-captioning multimodal-datasets multimodal-deep-learning salesforce vision-and-language vision-framework vision-language-pretraining vision-language-transformer visual-question-anwsering
Last synced: 25 Oct 2024
https://github.com/damo-nlp-sg/video-llama
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
blip2 cross-modal-pretraining large-language-models llama minigpt4 multi-modal-chatgpt video-language-pretraining vision-language-pretraining
Last synced: 22 Dec 2024
https://github.com/DAMO-NLP-SG/Video-LLaMA
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
blip2 cross-modal-pretraining large-language-models llama minigpt4 multi-modal-chatgpt video-language-pretraining vision-language-pretraining
Last synced: 29 Oct 2024
https://github.com/deepseek-ai/deepseek-vl
DeepSeek-VL: Towards Real-World Vision-Language Understanding
foundation-models vision-language-model vision-language-pretraining
Last synced: 21 Dec 2024
https://github.com/deepseek-ai/DeepSeek-VL
DeepSeek-VL: Towards Real-World Vision-Language Understanding
foundation-models vision-language-model vision-language-pretraining
Last synced: 05 Nov 2024
https://github.com/mbzuai-oryx/video-chatgpt
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
chatbot clip gpt-4 llama llava mulit-modal vicuna video-chatboat video-conversation vision-language vision-language-pretraining
Last synced: 19 Dec 2024
https://github.com/deepseek-ai/janus
Janus-Series: Unified Multimodal Understanding and Generation Models
any-to-any foundation-models llm multimodal unified-model vision-language-pretraining
Last synced: 20 Dec 2024
https://github.com/deepseek-ai/Janus
Janus-Series: Unified Multimodal Understanding and Generation Models
any-to-any foundation-models llm multimodal unified-model vision-language-pretraining
Last synced: 06 Dec 2024
https://github.com/mbzuai-oryx/Video-ChatGPT
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
chatbot clip gpt-4 llama llava mulit-modal vicuna video-chatboat video-conversation vision-language vision-language-pretraining
Last synced: 24 Oct 2024
https://github.com/Sense-GVT/DeCLIP
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
big-model clip image-text multi-model self-supervised vision-language-pretraining zero-shot
Last synced: 04 Nov 2024
https://github.com/mbzuai-oryx/videogpt-plus
Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
chatbot clip dual-encoder gpt4 gpt4o image-encoder llama3 llava multimodal phi-3-mini vicuna video-chatbot video-conversation video-encoder vision-language vision-language-pretraining
Last synced: 20 Dec 2024
https://github.com/mbzuai-oryx/VideoGPT-plus
Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
chatbot clip dual-encoder gpt4 gpt4o image-encoder llama3 llava multimodal phi-3-mini vicuna video-chatbot video-conversation video-encoder vision-language vision-language-pretraining
Last synced: 12 Dec 2024
https://github.com/tencentarc/flm
Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)
language-modeling vision-language-pretraining
Last synced: 05 Nov 2024
https://github.com/ttengwang/vlmixer
VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix (ICML 2022)
vision-language vision-language-pretraining
Last synced: 16 Nov 2024
https://github.com/buaadreamer/ccrk
[KDD 2024] Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning
cross-lingual cross-lingual-retrieval cross-modal cross-modal-retrieval iglue image-text-retrieval image-text-search kdd2024 mscoco multi30k retrieval swin-transformer vision-language-pretraining wit xflickrco xlm-roberta
Last synced: 06 Dec 2024
https://github.com/ahmdtaha/distributed_sigmoid_loss
Unofficial implementation for Sigmoid Loss for Language Image Pre-Training
contrastive-learning distributed-data-parallel multimodal-deep-learning python3 pytorch self-supervised-learning unsupervised-learning vision-and-language vision-language vision-language-pretraining vision-transformer
Last synced: 06 Nov 2024