An open API service indexing awesome lists of open source software.

Projects in Awesome Lists by OpenGVLab

A curated list of projects in awesome lists by OpenGVLab .

https://github.com/opengvlab/internvl

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

gpt gpt-4o gpt-4v image-classification image-text-retrieval llm multi-modal semantic-segmentation video-classification vision-language-model vit-22b vit-6b

Last synced: 12 May 2025

https://github.com/OpenGVLab/InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

gpt gpt-4o gpt-4v image-classification image-text-retrieval llm multi-modal semantic-segmentation video-classification vision-language-model vit-22b vit-6b

Last synced: 16 Mar 2025

https://github.com/opengvlab/llama-adapter

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

Last synced: 07 Oct 2025

https://github.com/OpenGVLab/LLaMA-Adapter

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

Last synced: 14 Mar 2025

https://github.com/OpenGVLab/DragGAN

Unofficial Implementation of DragGAN - "Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold" (DragGAN 全功能实现,在线Demo,本地部署试用,代码、模型已全部开源,支持Windows, macOS, Linux)

draggan gradio-interface image-editing image-generation interngpt

Last synced: 02 Apr 2025

https://github.com/opengvlab/draggan

Unofficial Implementation of DragGAN - "Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold" (DragGAN 全功能实现,在线Demo,本地部署试用,代码、模型已全部开源,支持Windows, macOS, Linux)

draggan gradio-interface image-editing image-generation interngpt

Last synced: 14 May 2025

https://github.com/OpenGVLab/InternGPT

InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)

chatgpt click draggan foundation-model gpt gpt-4 gradio husky image-captioning imagebind internimage langchain llama llm multimodal sam segment-anything vicuna video-generation vqa

Last synced: 27 Mar 2025

https://github.com/opengvlab/interngpt

InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)

chatgpt click draggan foundation-model gpt gpt-4 gradio husky image-captioning imagebind internimage langchain llama llm multimodal sam segment-anything vicuna video-generation vqa

Last synced: 14 May 2025

https://github.com/opengvlab/ask-anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

big-model captioning-videos chat chatgpt foundation-models gradio langchain large-language-models large-model stablelm video video-question-answering video-understanding

Last synced: 14 May 2025

https://github.com/OpenGVLab/Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

big-model captioning-videos chat chatgpt foundation-models gradio langchain large-language-models large-model stablelm video video-question-answering video-understanding

Last synced: 24 Mar 2025

https://github.com/opengvlab/internimage

[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

backbone deformable-convolution foundation-model object-detection semantic-segmentation

Last synced: 10 Apr 2025

https://github.com/OpenGVLab/InternImage

[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

backbone deformable-convolution foundation-model object-detection semantic-segmentation

Last synced: 20 Mar 2025

https://github.com/OpenGVLab/SAM-Med2D

Official implementation of SAM-Med2D

Last synced: 04 Apr 2025

https://github.com/OpenGVLab/VideoMamba

[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding

Last synced: 07 May 2025

https://github.com/OpenGVLab/OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

large-language-models llm quantization

Last synced: 07 May 2025

https://github.com/opengvlab/multi-modality-arena

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

chat chatbot chatgpt gradio large-language-models llms multi-modality vision-language-model vqa

Last synced: 20 Apr 2025

https://github.com/OpenGVLab/Multi-Modality-Arena

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

chat chatbot chatgpt gradio large-language-models llms multi-modality vision-language-model vqa

Last synced: 03 Apr 2025

https://github.com/OpenGVLab/DCNv4

[CVPR 2024] Deformable Convolution v4

Last synced: 20 Mar 2025

https://github.com/OpenGVLab/All-Seeing

[ICLR 2024] This is the official implementation of the paper "The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World"

all-seeing dataset region-text

Last synced: 24 Jul 2025

https://github.com/opengvlab/scalecua

ScaleCUA is the open-sourced computer use agents that can operate on corss-platform environments (Windows, macOS, Ubuntu, Android).

computer-use-agents data gui-agents models online-evaluation-suite scalecua

Last synced: 28 Oct 2025

https://github.com/OpenGVLab/Instruct2Act

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model

chatgpt clip llm robotics segment-anything

Last synced: 06 May 2025

https://github.com/OpenGVLab/Vision-RWKV

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

Last synced: 22 Jul 2025

https://github.com/OpenGVLab/CaFo

[CVPR 2023] Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners

Last synced: 20 Mar 2025

https://github.com/opengvlab/instruct2act

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model

chatgpt clip llm robotics segment-anything

Last synced: 20 Apr 2025

https://github.com/opengvlab/unmasked_teacher

[ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models

Last synced: 25 Oct 2025

https://github.com/OpenGVLab/LAMM

[NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents

Last synced: 27 Jul 2025

https://github.com/OpenGVLab/UniFormerV2

[ICCV2023] UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer

Last synced: 20 Mar 2025

https://github.com/opengvlab/humanbench

This repo is official implementation of HumanBench (CVPR2023)

Last synced: 30 Jun 2025

https://github.com/opengvlab/diffree

Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model

Last synced: 30 Jun 2025

https://github.com/opengvlab/mm-interleaved

MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer

Last synced: 30 Jun 2025

https://github.com/opengvlab/gv-benchmark

General Vision Benchmark, GV-B, a project from OpenGVLab

Last synced: 30 Jun 2025

https://github.com/opengvlab/unihcp

Official PyTorch implementation of UniHCP

Last synced: 30 Jun 2025

https://github.com/OpenGVLab/Diffree

Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model

Last synced: 28 Mar 2025

https://github.com/OpenGVLab/DriveMLM

Last synced: 16 Mar 2025

https://github.com/opengvlab/hulk

An official implementation of "Hulk: A Universal Knowledge Translator for Human-Centric Tasks"

Last synced: 25 Aug 2025

https://github.com/OpenGVLab/ChartAst

[ACL 2024] ChartAssistant is a chart-based vision-language model for universal chart comprehension and reasoning.

Last synced: 14 Oct 2025

https://github.com/opengvlab/mmt-bench

ICML'2024 | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Last synced: 30 Jun 2025

https://github.com/OpenGVLab/MM-NIAH

[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of existing MLLMs to comprehend long multimodal documents.

benchmark long-context multimodal-large-language-models vision-language-model

Last synced: 17 Apr 2025

https://github.com/OpenGVLab/MMT-Bench

ICML'2024 | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Last synced: 17 Apr 2025

https://github.com/opengvlab/internvl-mmdetseg

Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed

object-detection semantic-segmentation vision-foundation

Last synced: 17 Jun 2025

https://github.com/OpenGVLab/M3I-Pretraining

[CVPR 2023] implementation of Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information.

Last synced: 19 Apr 2025

https://github.com/opengvlab/awesome-draggan

Awesome-DragGAN: A curated list of papers, tutorials, repositories related to DragGAN

awesome-list draggan gan

Last synced: 30 Jun 2025

https://github.com/opengvlab/mutr

「AAAI 2024」 Referred by Multi-Modality: A Unified Temporal Transformers for Video Object Segmentation

Last synced: 30 Jun 2025

https://github.com/opengvlab/mmiu

[ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Last synced: 30 Jun 2025

https://github.com/opengvlab/zerogui

ZeroGUI: Automating Online GUI Learning at Zero Human Cost

Last synced: 28 Oct 2025

https://github.com/opengvlab/pvc

[CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models

Last synced: 20 Aug 2025

https://github.com/OpenGVLab/MMIU

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Last synced: 04 Sep 2025

https://github.com/OpenGVLab/De-focus-Attention-Networks

Learning 1D Causal Visual Representation with De-focus Attention Networks

Last synced: 17 Sep 2025

https://github.com/opengvlab/de-focus-attention-networks

Learning 1D Causal Visual Representation with De-focus Attention Networks

Last synced: 06 Jul 2025

https://github.com/opengvlab/diffagent

[CVPR 2024] DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model

Last synced: 30 Jun 2025

https://github.com/opengvlab/perception_test_iccv2023

Champion Solutions repository for Perception Test challenges in ICCV2023 workshop.

audio-visual deep-learning iccv2023

Last synced: 30 Jun 2025

https://github.com/opengvlab/sid-vln

Official implementation of: Learning Goal-Oriented Language-Guided Navigation with Self-Improving Demonstrations at Scale

Last synced: 28 Oct 2025

https://github.com/OpenGVLab/PVC

PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models

Last synced: 23 Sep 2025