Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/farishijazi/multimodal-emergency-classification
Classifying emergency reports from texts, images, audios, videos with BLIP + ChatGPT
https://github.com/farishijazi/multimodal-emergency-classification
Last synced: about 2 months ago
JSON representation
Classifying emergency reports from texts, images, audios, videos with BLIP + ChatGPT
- Host: GitHub
- URL: https://github.com/farishijazi/multimodal-emergency-classification
- Owner: FarisHijazi
- Created: 2023-04-27T20:35:09.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2023-07-23T08:33:21.000Z (over 1 year ago)
- Last Synced: 2024-10-15T01:23:26.095Z (3 months ago)
- Language: Python
- Size: 5.99 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# AI Emergency classification
⚠ not all code was uploaded, this repo is not fully functional ⚠
This is the 2nd place winning project for the 3rd Absher hackathon out of 4000 competing teams. This code was put together in a few days (code quality may not be the best).
See winner announcements [here](https://twitter.com/Absher/status/1660583471155228673) and [here](https://www.linkedin.com/feed/update/urn:li:activity:7066683163514683392/)
The idea is to classify emergencies from any media (video, image, audio, text) and then redirect to the correct emergency service by using a multimodal Large Language Model (LLM).
The user can upload any media (video, image, audio, text) and multiple deep learning models and a Large Language Model agent will classify the emergency and immediately redirect to the correct emergency service.
The LLM is in charge of taking the unstructured output of the upstream models and structuring it and then classifying based on the rules in the `config.json`
- [BLIP2](https://huggingface.co/docs/transformers/main/model_doc/blip-2) image captioning model
- [OpenAI whisper](https://github.com/openai/whisper) speech-to-text model![diagram](assets/flow-diagram.png)
Video here: https://www.youtube.com/watch?v=VMoDf4XyOq4
## Future work
Currently we use a feature engineering approach where we use Speech-to-text and image captioning models to extract information as text and then feed it into the downstream model (the LLM), however this could be improved by fusing the image captioning model with the LLM such as miniGPT4, I have tried this however the results were worse and slower, so until I find a better multimodal LLM, I will be sticking with this appraoch.
- [minGPT4](https://huggingface.co/Vision-CAIR/MiniGPT-4)
## setup
```
conda create -n absherthon python=3.10 -y
conda activate absherthon
pip install -r requirements.txt
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia -y
```pip install --upgrade accelerate
python app.pyrun
```
export OPENAI_API_KEY=sk-... # you must get this from openai.com
python demo.py
```