Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/showlab/vlog
Transform Video as a Document with ChatGPT, CLIP, BLIP2, GRIT, Whisper, LangChain.
https://github.com/showlab/vlog
chatgpt langchain large-language-model video-language whisper
Last synced: about 2 months ago
JSON representation
Transform Video as a Document with ChatGPT, CLIP, BLIP2, GRIT, Whisper, LangChain.
- Host: GitHub
- URL: https://github.com/showlab/vlog
- Owner: showlab
- License: mit
- Created: 2023-04-20T13:43:25.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-07-25T05:14:55.000Z (about 1 year ago)
- Last Synced: 2024-08-01T15:49:40.136Z (about 2 months ago)
- Topics: chatgpt, langchain, large-language-model, video-language, whisper
- Language: Python
- Homepage:
- Size: 15.8 MB
- Stars: 512
- Watchers: 6
- Forks: 23
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-ChatGPT-repositories - VLog - Transform Video as a Document with ChatGPT, CLIP, BLIP2, GRIT, Whisper, LangChain. (Langchain)
README
# π VLog: Video as a Long Document
Given a long video, we turn it into a doc containing visual + audio info. By sending this doc to ChatGPT, we can chat over the video!
![vlog](figures/vlog.jpg)
### **News**
- 23/April/2023: We release [Huggingface gradio demo](https://huggingface.co/spaces/TencentARC/VLog)!
- 20/April/2023: We release our project on github and local gradio demo!### To Do List
**Done**
- [x] LLM Reasoner: ChatGPT (multilingual) + LangChain
- [x] Vision Captioner: BLIP2 + GRIT
- [x] ASR Translator: Whisper (multilingual)
- [x] Video Segmenter: KTS
- [x] Huggingface Space**Doing**
- [ ] Optimize the codebase efficiency
- [ ] Improve Vision Models: MiniGPT-4 / LLaVA, Family of Segment-anything
- [ ] Improve ASR Translator for better alignment
- [ ] Introduce Temporal dependency
- [ ] Replace ChatGPT with own trained LLM## π§Έ Examples
[ News - GPT4 launch event ]
[ TV series - εΎζδΉεεΌΊδΉ°η ]
[ TV series - The Big Bang Theory ]
[ Travel video - Travel in Rome ]
[ Vlog - Basketball training ]
## π¨ Preparation
Please find installation instructions in [install.md](install.md).
## π Start here
### Run in cmd
```
python main.py --video_path examples/buy_watermelon.mp4 --openai_api_key xxxxx
```The generated video document will be generated and saved in `examples/buy_watermelon.log`
### Run in Gradio
```
python main_gradio.py --openai_api_key xxxxx
```## π Suggestion
Stay tuned for our project π₯
If you have more suggestions or functions need to be implemented in this codebase, feel free to drop us an email `[email protected]`, `[email protected]` or open an issue.
## π Acknowledgment
This work is based on [ChatGPT](http://chat.openai.com), [BLIP2](https://huggingface.co/spaces/Salesforce/BLIP2), [GRIT](https://github.com/JialianW/GRiT), [KTS](https://inria.hal.science/hal-01022967/PDF/video_summarization.pdf), [Whisper](https://github.com/openai/whisper), [LangChain](https://python.langchain.com/en/latest/), [Image2Paragraph](https://github.com/showlab/Image2Paragraph).