Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/opengvlab/interngpt

InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
https://github.com/opengvlab/interngpt

chatgpt click draggan foundation-model gpt gpt-4 gradio husky image-captioning imagebind internimage langchain llama llm multimodal sam segment-anything vicuna video-generation vqa

Last synced: 29 days ago
JSON representation

InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)

Awesome Lists containing this project

README

        

[[中文文档]](README_CN.md)

**The project is still under construction, we will continue to update it and welcome contributions/pull requests from the community.**


|
|

# 🤖💬 InternGPT [[Paper](https://arxiv.org/pdf/2305.05662.pdf)]

**InternGPT**(short for **iGPT**) / **InternChat**(short for **iChat**) is pointing-language-driven visual interactive system, allowing you to interact with ChatGPT by clicking, dragging and drawing using a pointing device. The name InternGPT stands for **inter**action, **n**onverbal, and Chat**GPT**. Different from existing interactive systems that rely on pure language, by incorporating pointing instructions, iGPT significantly improves the efficiency of communication between users and chatbots, as well as the accuracy of chatbots in vision-centric tasks, especially in complicated visual scenarios. Additionally, in iGPT, an auxiliary control mechanism is used to improve the control capability of LLM, and a large vision-language model termed **Husky** is fine-tuned for high-quality multi-modal dialogue (impressing ChatGPT-3.5-turbo with **93.89% GPT-4 Quality**).

## 🤖💬 Online Demo
**InternGPT** is online (see [https://igpt.opengvlab.com](https://igpt.opengvlab.com/)). Let's try it!

[**NOTE**] It is possible that you are waiting in a lengthy queue. You can clone our repo and run it with your private GPU.

**Video Demo with DragGAN: **

https://github.com/OpenGVLab/InternGPT/assets/13723743/529abde4-5dce-48de-bb38-0a0c199bb980

**Video Demo with ImageBind: **

https://github.com/OpenGVLab/InternGPT/assets/13723743/bacf3e58-6c24-4c0f-8cf7-e0c4b8b3d2af

**iGPT Video Demo: **

https://github.com/OpenGVLab/InternGPT/assets/13723743/8fd9112f-57d9-4871-a369-4e1929aa2593

## 🥳 🚀 What's New
- (2023.06.19) We optimize the GPU memory usage when executing the tools. Please refer to [Get Started](#get_started).

- (2023.06.19) We update the [INSTALL.md](https://github.com/OpenGVLab/InternGPT/blob/main/INSTALL.md) which provides more detailed instructions for setting up environment.

- (2023.05.31) It is with great regret that due to some emergency reasons, we have to suspend the online demo. If you want to experience all the features, please try them after deploying locally.

- (2023.05.24) 🎉🎉🎉 We have supported the [DragGAN](https://github.com/Zeqiang-Lai/DragGAN)! Please see the [video demo](#draggan_demo) for the usage. Let's try this awesome feauture: [Demo](https://igpt.opengvlab.com/). (我们现在支持了功能完全的[DragGAN](https://github.com/Zeqiang-Lai/DragGAN)! 可以拖动、可以自定义图片,具体用法见[video demo](#draggan_demo),复现的DragGAN代码在[这里](https://github.com/Zeqiang-Lai/DragGAN),在线demo在[这里](https://igpt.opengvlab.com/))

- (2023.05.18) We have supported [ImageBind](https://github.com/facebookresearch/ImageBind). Please see the [video demo](#imagebind_demo) for the usage.

- (2023.05.15) The [model_zoo](https://huggingface.co/spaces/OpenGVLab/InternGPT/tree/main/model_zoo) including HuskyVQA has been released! Try it on your local machine!

- (2023.05.15) Our code is also publicly available on [Hugging Face](https://huggingface.co/spaces/OpenGVLab/InternGPT)! You can duplicate the repository and run it on your own GPUs.

### 🧭 User Manual

Update:

(2023.05.24) We now support [DragGAN](https://arxiv.org/abs/2305.10973). You can try it as follows:
- Click the button `New Image`;
- Click the image where blue denotes the start point and red denotes the end point;
- Notice that the number of blue points is the same as the number of red points. Then you can click the button `Drag It`;
- After processing, you will receive an edited image and a video that visualizes the editing process.

(2023.05.18) We now support [ImageBind](https://github.com/facebookresearch/ImageBind). If you want to generate a new image conditioned on audio, you can upload an audio file in advance:
- To **generate a new image from a single audio file**, you can send the message like: `"generate a real image from this audio"`;
- To **generate a new image from audio and text**, you can send the message like: `"generate a real image from this audio and {your prompt}"`;
- To **generate a new image from audio and image**, you need to upload an image and then send the message like: `"generate a new image from above image and audio"`.


**Main features:**

After uploading the image, you can have a **multi-modal dialogue** by sending messages like: `"what is it in the image?"` or `"what is the background color of image?"`.
You also can interactively operate, edit or generate the image as follows:
- You can click the image and press the button **`Pick`** to **visualize the segmented region** or press the button **`OCR`** to **recognize the words** at chosen position;
- To **remove the masked reigon** in the image, you can send the message like: `"remove the masked region"`;
- To **replace the masked reigon** in the image, you can send the message like: `"replace the masked region with {your prompt}"`;
- To **generate a new image**, you can send the message like: `"generate a new image based on its segmentation describing {your prompt}"`
- To **create a new image by your scribble**, you should press button **`Whiteboard`** and draw in the board. After drawing, you need to press the button **`Save`** and send the message like: `"generate a new image based on this scribble describing {your prompt}"`.

## 🗓️ Schedule
- [ ] Support [VisionLLM](https://github.com/OpenGVLab/VisionLLM)
- [ ] Support Chinese
- [ ] Support MOSS
- [ ] More powerful foundation models based on [InternImage](https://github.com/OpenGVLab/InternImage) and [InternVideo](https://github.com/OpenGVLab/InternVideo)
- [ ] More accurate interactive experience
- [ ] OpenMMLab toolkit
- [ ] Web page & code generation
- [ ] Support search engine
- [ ] Low cost deployment
- [x] Support [DragGAN](https://arxiv.org/abs/2305.10973)
- [x] Support [ImageBind](https://github.com/facebookresearch/ImageBind)
- [x] Response verification for agent
- [x] Prompt optimization
- [x] User manual and video demo
- [x] Support voice assistant
- [x] Support click interaction
- [x] Interactive image editing
- [x] Interactive image generation
- [x] Interactive visual question answering
- [x] Segment anything
- [x] Image inpainting
- [x] Image caption
- [x] Image matting
- [x] Optical character recognition
- [x] Action recognition
- [x] Video caption
- [x] Video dense caption
- [x] Video highlight interpretation

## 🏠 System Overview

arch

## 🎁 Major Features

Remove the masked object

Interactive image editing

Image generation

Interactive visual question answer

Interactive image generation

Video highlight interpretation

## 🛠️ Installation

See [INSTALL.md](INSTALL.md)

## 👨‍🏫 Get Started

Running the following shell can start a gradio service for our basic features:
```shell
python -u app.py --load "HuskyVQA_cuda:0,SegmentAnything_cuda:0,ImageOCRRecognition_cuda:0" --port 3456 -e
```

if you want to enable the voice assistant, please use `openssl` to generate the certificate:
```shell
mkdir certificate
openssl req -x509 -newkey rsa:4096 -keyout certificate/key.pem -out certificate/cert.pem -sha256 -days 365 -nodes
```

and then run:
```shell
python -u app.py --load "HuskyVQA_cuda:0,SegmentAnything_cuda:0,ImageOCRRecognition_cuda:0" \
--port 3456 --https -e
```

For all features of our iGPT, you need to run:
```shell
python -u app.py \
--load "ImageOCRRecognition_cuda:0,Text2Image_cuda:0,SegmentAnything_cuda:0,ActionRecognition_cuda:0,VideoCaption_cuda:0,DenseCaption_cuda:0,ReplaceMaskedAnything_cuda:0,LDMInpainting_cuda:0,SegText2Image_cuda:0,ScribbleText2Image_cuda:0,Image2Scribble_cuda:0,Image2Canny_cuda:0,CannyText2Image_cuda:0,StyleGAN_cuda:0,Anything2Image_cuda:0,HuskyVQA_cuda:0" \
-p 3456 --https -e
```

Notice that `-e` flag can save a lot of memory.

### Selectively Loading Features
When you only want to try DragGAN, you just need to load StyleGAN and open the tab "DragGAN":
```shell
python -u app.py --load "StyleGAN_cuda:0" --tab "DragGAN" --port 3456 --https -e
```

In this situation, you can only use the functions of DragGAN, which frees you from some dependencies that you are not interested in.

## 🎫 License

This project is released under the [Apache 2.0 license](LICENSE).

## 🖊️ Citation

If you find this project useful in your research, please consider cite:

```BibTeX
@article{2023interngpt,
title={InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language},
author={Liu, Zhaoyang and He, Yinan and Wang, Wenhai and Wang, Weiyun and Wang, Yi and Chen, Shoufa and Zhang, Qinglong and Lai, Zeqiang and Yang, Yang and Li, Qingyun and Yu, Jiashuo and others},
journal={arXiv preprint arXiv:2305.05662},
year={2023}
}
```

## 🤝 Acknowledgement
Thanks to the open source of the following projects:

[Hugging Face](https://github.com/huggingface)  
[LangChain](https://github.com/hwchase17/langchain)  
[TaskMatrix](https://github.com/microsoft/TaskMatrix)  
[SAM](https://github.com/facebookresearch/segment-anything)  
[Stable Diffusion](https://github.com/CompVis/stable-diffusion)  
[ControlNet](https://github.com/lllyasviel/ControlNet)  
[InstructPix2Pix](https://github.com/timothybrooks/instruct-pix2pix)  
[BLIP](https://github.com/salesforce/BLIP)  
[Latent Diffusion Models](https://github.com/CompVis/latent-diffusion)  
[EasyOCR](https://github.com/JaidedAI/EasyOCR) 
[ImageBind](https://github.com/facebookresearch/ImageBind)  
[DragGAN](https://github.com/XingangPan/DragGAN)  

Welcome to discuss with us and continuously improve the user experience of InternGPT.

If you want to join our WeChat group, please scan the following QR Code to add our assistant as a Wechat friend:

image