https://github.com/kind-unes/multimodal-model

This project is a multi-modal model that works with multiple models combined and accepts audio, images, and text as inputs, generating corresponding audio, images, and text outputs.
https://github.com/kind-unes/multimodal-model

gemini multimodel opendalle-v1 python transformers

Last synced: 9 months ago
JSON representation

This project is a multi-modal model that works with multiple models combined and accepts audio, images, and text as inputs, generating corresponding audio, images, and text outputs.

Host: GitHub
URL: https://github.com/kind-unes/multimodal-model
Owner: Kind-Unes
Created: 2023-12-31T10:36:30.000Z (about 2 years ago)
Default Branch: master
Last Pushed: 2024-02-26T23:28:43.000Z (almost 2 years ago)
Last Synced: 2025-04-09T20:13:51.231Z (9 months ago)
Topics: gemini, multimodel, opendalle-v1, python, transformers
Language: Python
Homepage:
Size: 10.4 MB
Stars: 8
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Project Name

**Multi-Modal Model Python Project**

## Overview

This project is a multi-modal model that accepts audio, images, and text as inputs, generating corresponding audio, images, and text outputs.

## Features
- **Streamlit Interface** : Coming Soon
- **Input Modalities:** Audio, Images, Text, videos , emojis, multi inputs
- **Output Modalities:** Audio, Images, Text, Videos , emojis , segmented images, images objects detection coordinates, multi outputs

## Getting Started

### Prerequisites

- Python 3.x
- Dependencies listed in `requirements.txt`

### Installation

```bash
git clone https://github.com/Kind-Unes/Multi-Model-V1.git
cd 'MultiMODEL Template'
pip install -r requirements.txt
```
# Usage

python model.py

## Credits

### TXT2IMG Models

- [OpenDalleV1.1](https://huggingface.co/dataautogpt3/OpenDalleV1.1)
- [SSD-1B](https://huggingface.co/segmind/SSD-1B).
- [SSD-1B-Anime](https://huggingface.co/furusu/SSD-1B-anime)

### Text Generation Model

- [GOOGLE-GEMINI](https://deepmind.google/technologies/gemini/#introduction)

### IMG2TXT Model

- [BLIP_IMAGE_CAPTIONING_LARGE](https://huggingface.co/Salesforce/blip-image-captioning-large)

### TTS Model

- [FACEBOOK_MMS_TTS_ENG](https://huggingface.co/models/facebook/mms-tts-eng)

### STT Model

- [OPENAI_WHISPER_LARGE_V2](https://huggingface.co/openai/whisper-large-v2)

### Others **. . . . .**

### Websites
- [Base64-Image-Viewer](https://base64-viewer.onrender.com).
- [Base64-Audio-Reader](https://base64.guru/converter/decode/audio).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kind-unes/multimodal-model

Awesome Lists containing this project

README