Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with multi-modal

A curated list of projects in awesome lists tagged with multi-modal .

https://github.com/modelscope/modelscope

ModelScope: bring the notion of Model-as-a-Service to life.

cv deep-learning machine-learning multi-modal nlp python science speech

Last synced: 29 Sep 2024

https://github.com/OpenBMB/MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

minicpm minicpm-v multi-modal

Last synced: 01 Aug 2024

https://github.com/openbmb/minicpm-v

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

minicpm minicpm-v multi-modal

Last synced: 30 Sep 2024

https://github.com/thudm/cogvlm

a state-of-the-art-level open visual language model | 多模态预训练模型

cross-modality language-model multi-modal pretrained-models visual-language-models

Last synced: 01 Oct 2024

https://github.com/opengvlab/internvl

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

gpt gpt-4o gpt-4v image-classification image-text-retrieval llm multi-modal semantic-segmentation video-classification vision-language-model vit-22b vit-6b

Last synced: 01 Oct 2024

https://github.com/lucidrains/dalle-pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

artificial-intelligence attention-mechanism deep-learning multi-modal text-to-image transformers

Last synced: 03 Oct 2024

https://github.com/lucidrains/DALLE-pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

artificial-intelligence attention-mechanism deep-learning multi-modal text-to-image transformers

Last synced: 30 Jul 2024

https://github.com/THUDM/CogVLM

a state-of-the-art-level open visual language model | 多模态预训练模型

cross-modality language-model multi-modal pretrained-models visual-language-models

Last synced: 31 Jul 2024

https://github.com/OpenGVLab/InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的可商用开源多模态对话模型

gpt gpt-4o gpt-4v image-classification image-text-retrieval llm multi-modal semantic-segmentation video-classification vision-language-model vit-22b vit-6b

Last synced: 31 Jul 2024

https://github.com/THUDM/VisualGLM-6B

Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型

chatglm-6b gpt multi-modal

Last synced: 30 Jul 2024

https://github.com/pku-yuangroup/video-llava

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

instruction-tuning large-vision-language-model multi-modal

Last synced: 30 Sep 2024

https://github.com/PKU-YuanGroup/Video-LLaVA

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

instruction-tuning large-vision-language-model multi-modal

Last synced: 31 Jul 2024

https://github.com/scisharp/llamasharp

A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.

chatbot gpt llama llama-cpp llama2 llama3 llamacpp llava llm multi-modal semantic-kernel

Last synced: 27 Sep 2024

https://github.com/modelscope/data-juicer

A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!

chinese data-analysis data-science data-visualization dataset gpt gpt-4 instruction-tuning large-language-models llama llava llm llms multi-modal nlp opendata pre-training pytorch sora streamlit

Last synced: 29 Sep 2024

https://github.com/SciSharp/LLamaSharp

A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.

chatbot gpt llama llama-cpp llama2 llama3 llamacpp llava llm multi-modal semantic-kernel

Last synced: 31 Jul 2024

https://github.com/kav-k/gptdiscord

A robust, all-in-one GPT interface for Discord. ChatGPT-style conversations, image generation, AI-moderation, custom indexes/knowledgebase, youtube summarizer, and more!

artificial-intelligence asyncio chatbot code-interpreter collaborate dalle2 digitalocean discord embeddings extractive-question-answering github gpt3 hacktoberfest help-wanted moderator-bot multi-modal openai openai-api pinecone python

Last synced: 25 Sep 2024

https://github.com/Kav-K/GPTDiscord

A robust, all-in-one GPT interface for Discord. ChatGPT-style conversations, image generation, AI-moderation, custom indexes/knowledgebase, youtube summarizer, and more!

artificial-intelligence asyncio chatbot code-interpreter collaborate dalle2 digitalocean discord embeddings extractive-question-answering github gpt3 hacktoberfest help-wanted moderator-bot multi-modal openai openai-api pinecone python

Last synced: 31 Jul 2024

https://github.com/PKU-YuanGroup/MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models

large-vision-language-model mixture-of-experts moe multi-modal

Last synced: 31 Jul 2024

https://github.com/dvlab-research/lisa

Project Page for "LISA: Reasoning Segmentation via Large Language Model"

large-language-model llm multi-modal segmentation

Last synced: 01 Oct 2024

https://github.com/dvlab-research/LISA

Project Page for "LISA: Reasoning Segmentation via Large Language Model"

large-language-model llm multi-modal segmentation

Last synced: 01 Aug 2024

https://github.com/openmotionlab/motiongpt

[NeurIPS 2023] MotionGPT: Human Motion as a Foreign Language, a unified motion-language generation model using LLMs

3d-generation chatgpt gpt language-model motion motion-generation motiongpt multi-modal text-driven text-to-motion

Last synced: 29 Sep 2024

https://github.com/OpenMotionLab/MotionGPT

[NeurIPS 2023] MotionGPT: Human Motion as a Foreign Language, a unified motion-language generation model using LLMs

3d-generation chatgpt gpt language-model motion motion-generation motiongpt multi-modal text-driven text-to-motion

Last synced: 29 Jul 2024

https://github.com/microsoft/farmvibes-ai

FarmVibes.AI: Multi-Modal GeoSpatial ML Models for Agriculture and Sustainability

agriculture ai geospatial geospatial-analytics multi-modal remote-sensing stac sustainability weather

Last synced: 01 Aug 2024

https://github.com/salesforce/UniControl

Unified Controllable Visual Generation Model

aigc generation multi-modal

Last synced: 31 Jul 2024

https://github.com/lucidrains/transfusion-pytorch

Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI

artificial-intelligence attention deep-learning flow-matching multi-modal transformers

Last synced: 03 Oct 2024

https://github.com/modelscope/agentscope

Start building LLM-empowered multi-agent applications in an easier way.

agent chatbot distributed-agents gpt-4 large-language-models llm llm-agent multi-agent multi-modal

Last synced: 01 Aug 2024

https://github.com/open-compass/VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 40+ HF models, 20+ benchmarks

chatgpt claude clip computer-vision evaluation gemini gpt gpt-4v gpt4 large-language-models llava llm multi-modal openai openai-api pytorch qwen vit vqa

Last synced: 08 Aug 2024

https://github.com/open-compass/vlmevalkit

Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 40+ HF models, 20+ benchmarks

chatgpt claude clip computer-vision evaluation gemini gpt gpt-4v gpt4 large-language-models llava llm multi-modal openai openai-api pytorch qwen vit vqa

Last synced: 02 Aug 2024

https://github.com/v-iashin/SpecVQGAN

Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

audio audio-generation bmvc evaluation-metrics gan melgan multi-modal pytorch transformer vas vggsound video video-features video-understanding vqvae

Last synced: 01 Aug 2024

https://github.com/THUDM/CogVLM2

GPT4V-level open-source multi-modal model based on Llama3-8B

cogvlm language-model multi-modal pretrained-models

Last synced: 03 Aug 2024

https://github.com/wangsuzhen/Audio2Head

code for paper "Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion" in the conference of IJCAI 2021

codes ijcai2021 multi-modal paper talking-face talking-head

Last synced: 31 Jul 2024

https://github.com/wisconsinaivision/vip-llava

[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

chatbot clip cvpr2024 foundation-models gpt-4 gpt-4-vision llama llama2 llava multi-modal vision-language visual-prompting

Last synced: 27 Sep 2024

https://github.com/Haiyang-W/UniTR

[ICCV2023] Official Implementation of "UniTR: A Unified and Efficient Multi-Modal Transformer for Bird’s-Eye-View Representation"

3d 3d-object-detection 3d-segmentation backbone bev camera computer-vision iccv2023 lidar multi-modal multi-view point-cloud transformer unified

Last synced: 31 Jul 2024

https://github.com/Open3DA/LL3DA

[CVPR 2024] "LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning"; an interactive Large Language 3D Assistant.

3d 3d-models 3d-to-text cvpr2024 gpt instruction-tuning language-model llm multi-modal scene-understanding

Last synced: 26 Sep 2024

https://github.com/OpenShapeLab/ShapeGPT

ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model, a unified and user-friendly shape-language model

3d-generation caption-generation chatgpt gpt language-model multi-modal shape unified

Last synced: 31 Jul 2024

https://github.com/qcraftai/distill-bev

DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal Knowledge Distillation (ICCV 2023)

3d-object-detection autonomous-driving bev cross-modal distillation knowledge-distillation lidar multi-camera multi-modal nuscenes point-cloud self-driving

Last synced: 31 Jul 2024

https://github.com/thu-ml/MMTrustEval

A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust)

benchmark claude fairness gpt-4 mllm multi-modal privacy robustness safety toolbox trustworthy-ai truthfulness

Last synced: 12 Aug 2024

https://github.com/guyyariv/AudioToken

This repo contains the official PyTorch implementation of AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation

ai-art audio-to-image audio2image deep-learning diffusion-models image-generation multi-modal stable-diffusion text2image

Last synced: 05 Aug 2024

https://github.com/howard-hou/VisualRWKV

VisualRWKV is the visual-enhanced version of the RWKV language model, enabling RWKV to handle various visual tasks.

large-language-models multi-modal rwkv

Last synced: 03 Aug 2024

https://github.com/Eaphan/UPIDet

Unleash the Potential of Image Branch for Cross-modal 3D Object Detection [NeurIPS2023]

3d-object-detection cross-modal multi-modal

Last synced: 31 Jul 2024

https://github.com/ThuCCSLab/FigStep

Jailbreaking Large Vision-language Models via Typographic Visual Prompts

gpt-4 jailbreak llm multi-modal safety security vlm

Last synced: 12 Aug 2024

https://github.com/kyegomez/switchtransformers

Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity"

ai gpt4 llama mixture-model mixture-of-experts mixture-of-models ml moe multi-modal

Last synced: 27 Sep 2024

https://github.com/icon-lab/I2I-Mamba

Official implementation of I2I-Mamba, an image-to-image translation model based on selective state spaces

artificial-intelligence deeplearning image-synthesis image-to-image-translation mamba mamba-state-space-models medical multi-modal neural-networks pytorch ssm

Last synced: 31 Jul 2024

https://github.com/Toytiny/RadarNet-pytorch

PyTorch code reproduction of RadarNet (ECCV'20) for radar-based 3D object detection

3d-detection 3d-vision automotive-radar lidar-point-cloud multi-modal object-detection point-cloud

Last synced: 31 Jul 2024

https://github.com/kyegomez/m2pt

Implementation of M2PT in PyTorch from the paper: "Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities"

ai attention attention-is-all-you-need gpt4 gpt5 llama ml models mulit-modality multi-modal

Last synced: 27 Sep 2024

https://github.com/kyegomez/qwen-vl

My personal implementation of the model from "Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities", they haven't released model code yet sooo...

ai artificial-intelligence attention attention-is-all-you-need gemini gpt-4 gpt4 llama ml multi-modal open-source-ai

Last synced: 27 Sep 2024

https://github.com/kyegomez/gats

Implementation of GATS from the paper: "GATS: Gather-Attend-Scatter" in pytorch and zeta

ai attention attention-is-all-you-need attention-mechanism gpt4 llama ml multi-modal multi-modality multimodal open-source

Last synced: 27 Sep 2024

https://github.com/liu42/contrastive

项目取材自 2024 年 ”泰迪杯“ 数据挖掘挑战赛 B 题,基于共享特征空间对比学习的跨模态图文互检模型

bert cnn computer-vision contrastive-learning deep-learning image-text-retrieval image-text-search multi-modal multi-modal-learning nlp pytorch roberta transformers

Last synced: 01 Oct 2024