https://github.com/kind-unes/multimodal-model
This project is a multi-modal model that works with multiple models combined and accepts audio, images, and text as inputs, generating corresponding audio, images, and text outputs.
https://github.com/kind-unes/multimodal-model
gemini multimodel opendalle-v1 python transformers
Last synced: 9 months ago
JSON representation
This project is a multi-modal model that works with multiple models combined and accepts audio, images, and text as inputs, generating corresponding audio, images, and text outputs.
- Host: GitHub
- URL: https://github.com/kind-unes/multimodal-model
- Owner: Kind-Unes
- Created: 2023-12-31T10:36:30.000Z (about 2 years ago)
- Default Branch: master
- Last Pushed: 2024-02-26T23:28:43.000Z (almost 2 years ago)
- Last Synced: 2025-04-09T20:13:51.231Z (9 months ago)
- Topics: gemini, multimodel, opendalle-v1, python, transformers
- Language: Python
- Homepage:
- Size: 10.4 MB
- Stars: 8
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Project Name
**Multi-Modal Model Python Project**
## Overview
This project is a multi-modal model that accepts audio, images, and text as inputs, generating corresponding audio, images, and text outputs.
## Features
- **Streamlit Interface** : Coming Soon
- **Input Modalities:** Audio, Images, Text, videos , emojis, multi inputs
- **Output Modalities:** Audio, Images, Text, Videos , emojis , segmented images, images objects detection coordinates, multi outputs
## Getting Started
### Prerequisites
- Python 3.x
- Dependencies listed in `requirements.txt`
### Installation
```bash
git clone https://github.com/Kind-Unes/Multi-Model-V1.git
cd 'MultiMODEL Template'
pip install -r requirements.txt
```
# Usage
python model.py
## Credits
### TXT2IMG Models
- [OpenDalleV1.1](https://huggingface.co/dataautogpt3/OpenDalleV1.1)
- [SSD-1B](https://huggingface.co/segmind/SSD-1B).
- [SSD-1B-Anime](https://huggingface.co/furusu/SSD-1B-anime)
### Text Generation Model
- [GOOGLE-GEMINI](https://deepmind.google/technologies/gemini/#introduction)
### IMG2TXT Model
- [BLIP_IMAGE_CAPTIONING_LARGE](https://huggingface.co/Salesforce/blip-image-captioning-large)
### TTS Model
- [FACEBOOK_MMS_TTS_ENG](https://huggingface.co/models/facebook/mms-tts-eng)
### STT Model
- [OPENAI_WHISPER_LARGE_V2](https://huggingface.co/openai/whisper-large-v2)
### Others **. . . . .**
### Websites
- [Base64-Image-Viewer](https://base64-viewer.onrender.com).
- [Base64-Audio-Reader](https://base64.guru/converter/decode/audio).