https://github.com/hi-space/multimodal-gen-ai-labs
Multimodal Generative AI Labs
https://github.com/hi-space/multimodal-gen-ai-labs
bedrock image-generation llm lmm multimodal
Last synced: about 2 months ago
JSON representation
Multimodal Generative AI Labs
- Host: GitHub
- URL: https://github.com/hi-space/multimodal-gen-ai-labs
- Owner: hi-space
- License: mit
- Created: 2024-08-30T06:04:31.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-03-03T10:04:25.000Z (3 months ago)
- Last Synced: 2025-03-03T11:22:40.412Z (3 months ago)
- Topics: bedrock, image-generation, llm, lmm, multimodal
- Language: Jupyter Notebook
- Homepage:
- Size: 41 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Multimodal Generative AI
This repository focuses on developing multimodal generative AI applications by leveraging various AWS services. It integrates various cutting-edge technologies and models to process and generate text, images, and other data types, offering a range of functionalities.
- **Multimodal LLM**: Enhances language understanding and generation by combining text and image data. It uses [Amazon Bedrock's Claude](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-claude.html).
- **Multimodal Embedding**: Represents text and images in a unified vector space, allowing for similarity comparison and retrieval between data. It uses the [Amazon Titan Text Embeddings v2](https://docs.aws.amazon.com/bedrock/latest/userguide/titan-embedding-models.html) and the [Amazon Titan Multimodal Embeddings G1 model](https://docs.aws.amazon.com/bedrock/latest/userguide/titan-multiemb-models.html).
- **Multimodal RAG (Retrieval-Augmented Generation)**: Receives text or images as input to retrieve relevant data and generates answers. It uses [Amazon Bedrock's Claude](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-claude.html), [Amazon Titan Text Embeddings v2](https://docs.aws.amazon.com/bedrock/latest/userguide/titan-embedding-models.html), [Amazon Titan Multimodal Embeddings G1 model](https://docs.aws.amazon.com/bedrock/latest/userguide/titan-multiemb-models.html), and [Amazon OpenSearch Service](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/what-is.html) as a vectorDB.- **Image Generation with Multimodal LLM**: Generates high-quality images based on textual or multimodal inputs. It uses [Amazon Titan Image Generator G1 model](https://docs.aws.amazon.com/bedrock/latest/userguide/titan-image-models.html) and [Amazon Bedrock's Claude](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-claude.html).
This project demonstrates how AWS services are used to create, manage, and scale these multimodal AI capabilities. Whether you’re building research tools, creative applications, or advanced AI solutions, this repository serves as a comprehensive starting point.