Projects in Awesome Lists by OpenGVLab
A curated list of projects in awesome lists by OpenGVLab .
https://github.com/opengvlab/internvl
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
gpt gpt-4o gpt-4v image-classification image-text-retrieval llm multi-modal semantic-segmentation video-classification vision-language-model vit-22b vit-6b
Last synced: 12 May 2025
https://github.com/OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
gpt gpt-4o gpt-4v image-classification image-text-retrieval llm multi-modal semantic-segmentation video-classification vision-language-model vit-22b vit-6b
Last synced: 16 Mar 2025
https://github.com/opengvlab/llama-adapter
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
Last synced: 07 Oct 2025
https://github.com/OpenGVLab/LLaMA-Adapter
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
Last synced: 14 Mar 2025
https://github.com/OpenGVLab/DragGAN
Unofficial Implementation of DragGAN - "Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold" (DragGAN 全功能实现,在线Demo,本地部署试用,代码、模型已全部开源,支持Windows, macOS, Linux)
draggan gradio-interface image-editing image-generation interngpt
Last synced: 02 Apr 2025
https://github.com/opengvlab/draggan
Unofficial Implementation of DragGAN - "Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold" (DragGAN 全功能实现,在线Demo,本地部署试用,代码、模型已全部开源,支持Windows, macOS, Linux)
draggan gradio-interface image-editing image-generation interngpt
Last synced: 14 May 2025
https://github.com/OpenGVLab/InternGPT
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
chatgpt click draggan foundation-model gpt gpt-4 gradio husky image-captioning imagebind internimage langchain llama llm multimodal sam segment-anything vicuna video-generation vqa
Last synced: 27 Mar 2025
https://github.com/opengvlab/interngpt
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
chatgpt click draggan foundation-model gpt gpt-4 gradio husky image-captioning imagebind internimage langchain llama llm multimodal sam segment-anything vicuna video-generation vqa
Last synced: 14 May 2025
https://github.com/opengvlab/ask-anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
big-model captioning-videos chat chatgpt foundation-models gradio langchain large-language-models large-model stablelm video video-question-answering video-understanding
Last synced: 14 May 2025
https://github.com/OpenGVLab/Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
big-model captioning-videos chat chatgpt foundation-models gradio langchain large-language-models large-model stablelm video video-question-answering video-understanding
Last synced: 24 Mar 2025
https://github.com/opengvlab/internimage
[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
backbone deformable-convolution foundation-model object-detection semantic-segmentation
Last synced: 10 Apr 2025
https://github.com/OpenGVLab/InternImage
[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
backbone deformable-convolution foundation-model object-detection semantic-segmentation
Last synced: 20 Mar 2025
https://github.com/OpenGVLab/InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
action-recognition benchmark contrastive-learning foundation-models instruction-tuning masked-autoencoder multimodal open-set-recognition self-supervised spatio-temporal-action-localization temporal-action-localization video-clip video-data video-dataset video-question-answering video-retrieval video-understanding vision-transformer zero-shot-classification zero-shot-retrieval
Last synced: 20 Mar 2025
https://github.com/OpenGVLab/SAM-Med2D
Official implementation of SAM-Med2D
Last synced: 04 Apr 2025
https://github.com/OpenGVLab/VideoMamba
[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding
Last synced: 07 May 2025
https://github.com/OpenGVLab/VisionLLM
VisionLLM Series
generalist-model large-language-models object-detection
Last synced: 07 Apr 2025
https://github.com/OpenGVLab/OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
large-language-models llm quantization
Last synced: 07 May 2025
https://github.com/opengvlab/videomaev2
[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
action-detection action-recognition cvpr2023 foundation-model self-supervised-learning temporal-action-detection video-understanding
Last synced: 30 Jun 2025
https://github.com/opengvlab/multi-modality-arena
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
chat chatbot chatgpt gradio large-language-models llms multi-modality vision-language-model vqa
Last synced: 20 Apr 2025
https://github.com/OpenGVLab/VideoMAEv2
[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
action-detection action-recognition cvpr2023 foundation-model self-supervised-learning temporal-action-detection video-understanding
Last synced: 16 Mar 2025
https://github.com/OpenGVLab/Multi-Modality-Arena
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
chat chatbot chatgpt gradio large-language-models llms multi-modality vision-language-model vqa
Last synced: 03 Apr 2025
https://github.com/OpenGVLab/All-Seeing
[ICLR 2024] This is the official implementation of the paper "The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World"
all-seeing dataset region-text
Last synced: 24 Jul 2025
https://github.com/opengvlab/scalecua
ScaleCUA is the open-sourced computer use agents that can operate on corss-platform environments (Windows, macOS, Ubuntu, Android).
computer-use-agents data gui-agents models online-evaluation-suite scalecua
Last synced: 28 Oct 2025
https://github.com/OpenGVLab/Instruct2Act
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model
chatgpt clip llm robotics segment-anything
Last synced: 06 May 2025
https://github.com/OpenGVLab/Vision-RWKV
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Last synced: 22 Jul 2025
https://github.com/OpenGVLab/CaFo
[CVPR 2023] Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners
Last synced: 20 Mar 2025
https://github.com/opengvlab/instruct2act
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model
chatgpt clip llm robotics segment-anything
Last synced: 20 Apr 2025
https://github.com/opengvlab/unmasked_teacher
[ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Last synced: 25 Oct 2025
https://github.com/OpenGVLab/LAMM
[NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents
Last synced: 27 Jul 2025
https://github.com/OpenGVLab/UniFormerV2
[ICCV2023] UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
Last synced: 20 Mar 2025
https://github.com/opengvlab/humanbench
This repo is official implementation of HumanBench (CVPR2023)
Last synced: 30 Jun 2025
https://github.com/opengvlab/diffree
Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model
Last synced: 30 Jun 2025
https://github.com/opengvlab/mm-interleaved
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
Last synced: 30 Jun 2025
https://github.com/opengvlab/gv-benchmark
General Vision Benchmark, GV-B, a project from OpenGVLab
Last synced: 30 Jun 2025
https://github.com/opengvlab/unihcp
Official PyTorch implementation of UniHCP
Last synced: 30 Jun 2025
https://github.com/OpenGVLab/Diffree
Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model
Last synced: 28 Mar 2025
https://github.com/opengvlab/hulk
An official implementation of "Hulk: A Universal Knowledge Translator for Human-Centric Tasks"
Last synced: 25 Aug 2025
https://github.com/OpenGVLab/ChartAst
[ACL 2024] ChartAssistant is a chart-based vision-language model for universal chart comprehension and reasoning.
Last synced: 14 Oct 2025
https://github.com/opengvlab/mmt-bench
ICML'2024 | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Last synced: 30 Jun 2025
https://github.com/OpenGVLab/MM-NIAH
[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of existing MLLMs to comprehend long multimodal documents.
benchmark long-context multimodal-large-language-models vision-language-model
Last synced: 17 Apr 2025
https://github.com/opengvlab/piip
[NeurIPS 2024 Spotlight ⭐️ & TPAMI 2025] Parameter-Inverted Image Pyramid Networks (PIIP)
computer-vision image-classification instance-segmentation multimodal-large-language-models object-detection semantic-segmentation vision-language-models vision-transformer
Last synced: 10 Oct 2025
https://github.com/OpenGVLab/MMT-Bench
ICML'2024 | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Last synced: 17 Apr 2025
https://github.com/opengvlab/internvl-mmdetseg
Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed
object-detection semantic-segmentation vision-foundation
Last synced: 17 Jun 2025
https://github.com/OpenGVLab/M3I-Pretraining
[CVPR 2023] implementation of Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information.
Last synced: 19 Apr 2025
https://github.com/opengvlab/awesome-draggan
Awesome-DragGAN: A curated list of papers, tutorials, repositories related to DragGAN
Last synced: 30 Jun 2025
https://github.com/opengvlab/mutr
「AAAI 2024」 Referred by Multi-Modality: A Unified Temporal Transformers for Video Object Segmentation
Last synced: 30 Jun 2025
https://github.com/opengvlab/mmiu
[ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Last synced: 30 Jun 2025
https://github.com/opengvlab/zerogui
ZeroGUI: Automating Online GUI Learning at Zero Human Cost
Last synced: 28 Oct 2025
https://github.com/opengvlab/pvc
[CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
Last synced: 20 Aug 2025
https://github.com/OpenGVLab/MMIU
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Last synced: 04 Sep 2025
https://github.com/OpenGVLab/De-focus-Attention-Networks
Learning 1D Causal Visual Representation with De-focus Attention Networks
Last synced: 17 Sep 2025
https://github.com/opengvlab/de-focus-attention-networks
Learning 1D Causal Visual Representation with De-focus Attention Networks
Last synced: 06 Jul 2025
https://github.com/opengvlab/diffagent
[CVPR 2024] DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model
Last synced: 30 Jun 2025
https://github.com/opengvlab/perception_test_iccv2023
Champion Solutions repository for Perception Test challenges in ICCV2023 workshop.
audio-visual deep-learning iccv2023
Last synced: 30 Jun 2025
https://github.com/opengvlab/sid-vln
Official implementation of: Learning Goal-Oriented Language-Guided Navigation with Self-Improving Demonstrations at Scale
Last synced: 28 Oct 2025
https://github.com/OpenGVLab/PVC
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
Last synced: 23 Sep 2025