An open API service indexing awesome lists of open source software.

https://github.com/kind-unes/multimodal-model

This project is a multi-modal model that works with multiple models combined and accepts audio, images, and text as inputs, generating corresponding audio, images, and text outputs.
https://github.com/kind-unes/multimodal-model

gemini multimodel opendalle-v1 python transformers

Last synced: 9 months ago
JSON representation

This project is a multi-modal model that works with multiple models combined and accepts audio, images, and text as inputs, generating corresponding audio, images, and text outputs.

Awesome Lists containing this project

README

          

# Project Name

**Multi-Modal Model Python Project**

## Overview

This project is a multi-modal model that accepts audio, images, and text as inputs, generating corresponding audio, images, and text outputs.

## Features
- **Streamlit Interface** : Coming Soon
- **Input Modalities:** Audio, Images, Text, videos , emojis, multi inputs
- **Output Modalities:** Audio, Images, Text, Videos , emojis , segmented images, images objects detection coordinates, multi outputs

## Getting Started

### Prerequisites

- Python 3.x
- Dependencies listed in `requirements.txt`

### Installation

```bash
git clone https://github.com/Kind-Unes/Multi-Model-V1.git
cd 'MultiMODEL Template'
pip install -r requirements.txt
```
# Usage

python model.py

## Credits

### TXT2IMG Models

- [OpenDalleV1.1](https://huggingface.co/dataautogpt3/OpenDalleV1.1)
- [SSD-1B](https://huggingface.co/segmind/SSD-1B).
- [SSD-1B-Anime](https://huggingface.co/furusu/SSD-1B-anime)

### Text Generation Model

- [GOOGLE-GEMINI](https://deepmind.google/technologies/gemini/#introduction)

### IMG2TXT Model

- [BLIP_IMAGE_CAPTIONING_LARGE](https://huggingface.co/Salesforce/blip-image-captioning-large)

### TTS Model

- [FACEBOOK_MMS_TTS_ENG](https://huggingface.co/models/facebook/mms-tts-eng)

### STT Model

- [OPENAI_WHISPER_LARGE_V2](https://huggingface.co/openai/whisper-large-v2)

### Others **. . . . .**

### Websites
- [Base64-Image-Viewer](https://base64-viewer.onrender.com).
- [Base64-Audio-Reader](https://base64.guru/converter/decode/audio).