Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/showlab/vlog

Transform Video as a Document with ChatGPT, CLIP, BLIP2, GRIT, Whisper, LangChain.
https://github.com/showlab/vlog

chatgpt langchain large-language-model video-language whisper

Last synced: about 2 months ago
JSON representation

Transform Video as a Document with ChatGPT, CLIP, BLIP2, GRIT, Whisper, LangChain.

Awesome Lists containing this project

README

        

# 🎞 VLog: Video as a Long Document


Open in Spaces


Tweet

Given a long video, we turn it into a doc containing visual + audio info. By sending this doc to ChatGPT, we can chat over the video!

![vlog](figures/vlog.jpg)

### **News**

- 23/April/2023: We release [Huggingface gradio demo](https://huggingface.co/spaces/TencentARC/VLog)!
- 20/April/2023: We release our project on github and local gradio demo!

### To Do List

**Done**

- [x] LLM Reasoner: ChatGPT (multilingual) + LangChain
- [x] Vision Captioner: BLIP2 + GRIT
- [x] ASR Translator: Whisper (multilingual)
- [x] Video Segmenter: KTS
- [x] Huggingface Space

**Doing**

- [ ] Optimize the codebase efficiency
- [ ] Improve Vision Models: MiniGPT-4 / LLaVA, Family of Segment-anything
- [ ] Improve ASR Translator for better alignment
- [ ] Introduce Temporal dependency
- [ ] Replace ChatGPT with own trained LLM

## 🧸 Examples

[ News - GPT4 launch event ]GPT4 launch event

[ TV series - εΎζœδΉ‹εŽεΌΊδΉ°η“œ ]εŽεΌΊδΉ°η“œ

[ TV series - The Big Bang Theory ]The Big Bang Theory

[ Travel video - Travel in Rome ]Travel in Rome

[ Vlog - Basketball training ]Basketball training

## πŸ”¨ Preparation

Please find installation instructions in [install.md](install.md).

## 🌟 Start here

### Run in cmd

```
python main.py --video_path examples/buy_watermelon.mp4 --openai_api_key xxxxx
```

The generated video document will be generated and saved in `examples/buy_watermelon.log`

### Run in Gradio

```
python main_gradio.py --openai_api_key xxxxx
```

## πŸ™‹ Suggestion

Stay tuned for our project πŸ”₯

If you have more suggestions or functions need to be implemented in this codebase, feel free to drop us an email `[email protected]`, `[email protected]` or open an issue.

## 😊 Acknowledgment

This work is based on [ChatGPT](http://chat.openai.com), [BLIP2](https://huggingface.co/spaces/Salesforce/BLIP2), [GRIT](https://github.com/JialianW/GRiT), [KTS](https://inria.hal.science/hal-01022967/PDF/video_summarization.pdf), [Whisper](https://github.com/openai/whisper), [LangChain](https://python.langchain.com/en/latest/), [Image2Paragraph](https://github.com/showlab/Image2Paragraph).