Projects in Awesome Lists tagged with multi-modal-learning

https://github.com/mlfoundations/open_clip

An open source implementation of CLIP.

computer-vision contrastive-loss deep-learning language-model multi-modal-learning pretrained-models pytorch zero-shot-classification

Last synced: 16 Dec 2024

https://github.com/ofa-sys/chinese-clip

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

chinese clip computer-vision contrastive-loss coreml-models deep-learning image-text-retrieval multi-modal multi-modal-learning nlp pretrained-models pytorch transformers vision-and-language-pre-training vision-language

Last synced: 16 Dec 2024

https://github.com/OFA-Sys/Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

chinese clip computer-vision contrastive-loss coreml-models deep-learning image-text-retrieval multi-modal multi-modal-learning nlp pretrained-models pytorch transformers vision-and-language-pre-training vision-language

Last synced: 03 Nov 2024

https://github.com/lyuchenyang/macaw-llm

Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

deep-learning language-model machine-learning multi-modal-learning natural-language-processing neural-networks

Last synced: 19 Dec 2024

https://github.com/lyuchenyang/Macaw-LLM

Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

deep-learning language-model machine-learning multi-modal-learning natural-language-processing neural-networks

Last synced: 07 Nov 2024

https://github.com/QIN2DIM/hcaptcha-challenger

🥂 Gracefully face hCaptcha challenge with MoE(ONNX) embedded solution.

clip computer-vision hcaptcha hcaptcha-solver image-segmentation multi-modal multi-modal-learning object-detection onnx onnx-models onnxruntime opencv-python playwright solver yolo yolov5 zero-shot-classification

Last synced: 31 Oct 2024

https://github.com/qin2dim/hcaptcha-challenger

🥂 Gracefully face hCaptcha challenge with MoE(ONNX) embedded solution.

clip computer-vision hcaptcha hcaptcha-solver image-segmentation multi-modal multi-modal-learning object-detection onnx onnx-models onnxruntime opencv-python playwright solver yolo yolov5 zero-shot-classification

Last synced: 19 Dec 2024

https://github.com/nvlabs/prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

image-captioning language-model multi-modal-learning multi-task-learning vision-and-language vision-language-model vqa

Last synced: 15 Dec 2024

https://github.com/lucidrains/x-clip

A concise but complete implementation of CLIP with various experimental improvements from recent papers

artificial-intelligence contrastive-learning deep-learning multi-modal-learning zero-shot-learning

Last synced: 21 Dec 2024

https://github.com/kyegomez/zeta

Build high-performance AI models with modular building blocks

artificial-intelligence deep-learning gpt4 llama2 longnet multi-agent-systems multi-modal multi-modal-learning multi-platform pytorch speech-recognition transformer transformers

Last synced: 19 Dec 2024

https://github.com/dmitryryumin/cvpr-2023-24-papers

CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included. ⭐ support visual intelligence development!

action-recognition autonomous-driving biometrics computer-vision cvpr cvpr2023 cvpr2024 datasets deep-learning face-recognition gesture-recognition image-synthesis medical-image-processing multi-modal-learning pattern-recognition scene-analysis segmentation self-supervised-learning shape-analysis video-synthesis

Last synced: 15 Dec 2024

https://github.com/qizekun/ReCon

[ICML 2023] Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining

3d-point-clouds multi-modal-learning representation-learning self-supervised-learning

Last synced: 28 Oct 2024

https://github.com/rentainhe/trar-vqa

[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"

attention clevr dynamic-network iccv2021 local-and-global multi-modal multi-modal-learning multi-modality multi-scale-features official pytorch transformer vision-and-language visual-question-answering visualization vqav2

Last synced: 07 Nov 2024

https://github.com/ttgeng233/UnAV

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)

audio-visual-events audio-visual-learning multi-modal-learning

Last synced: 16 Nov 2024

https://github.com/kyegomez/neva

The open source implementation of "NeVA: NeMo Vision and Language Assistant"

artificial-intelligence cuda gpt4 multi-modal multi-modal-learning multithreading neva nvidia robotics

Last synced: 09 Nov 2024

https://github.com/kyegomez/megavit

The open source implementation of the model from "Scaling Vision Transformers to 22 Billion Parameters"

artificial-intelligence computer-vision gpt4 multi-modal multi-modal-fusion multi-modal-learning vision-and-language vision-transformer

Last synced: 09 Nov 2024

https://github.com/sayakpaul/multimodal-entailment-baseline

This repository shows how to implement a basic model for multimodal entailment.

entailment keras multi-modal-learning tensorflow

Last synced: 23 Oct 2024

https://github.com/agora-lab-ai/ekr

Elysium Knowledge Repository is an open source initiative to embed all of Humanity's multi-modal knowledge and wisdom.

artificial-intelligence chroma embeddings multi-modal-learning multimodal pinecone vectordatabase

Last synced: 10 Nov 2024

https://github.com/chenxi52/FrozenSeg

Open-Vocabulary Panoptic Segmentation

clip instance-segmentation multi-modal-learning open-vocabulary open-vocabulary-segmentation open-vocabulary-semantic-segmentation panoptic-segmentation segment-anything segmentation vision-and-language zero-shot

Last synced: 30 Nov 2024

https://github.com/jianzhnie/multimodaltransformers

lmmtoolkit is a toolkit for Multi-Modal Learning

image-text multi-modal-learning text-image text-to-video

Last synced: 06 Dec 2024

https://github.com/lyuchenyang/semantic-aware-videoqa

Code for ACL SRW 2023 paepr "Semantic-aware Dynamic Retrospective-Prospective Reasoning for Event-level Video Question Answering"

artificial-intelligence deep-learning machine-learning multi-modal-learning natural-language-processing video-question-answering

Last synced: 21 Nov 2024

https://github.com/lyuchenyang/efficient-videoqa

Code for ACL SustaiNLP 2023 paper "Is a Video worth n × n Images? A Highly Efficient Approach to Transformer-based Video Question Answering"

artificial-intelligence deep-learning machine-learning multi-modal-learning natural-language-processing video-question-answering

Last synced: 21 Nov 2024

https://github.com/liu42/contrastive

项目取材自 2024 年 ”泰迪杯“ 数据挖掘挑战赛 B 题，基于共享特征空间对比学习的跨模态图文互检模型

bert cnn computer-vision contrastive-learning deep-learning image-text-retrieval image-text-search multi-modal multi-modal-learning nlp pytorch roberta transformers

Last synced: 13 Dec 2024

https://github.com/ammarlodhi255/metadata-augmented-neural-networks-for-wild-animal-classification

This repository contains the implementation code for the paper "Metadata Augmented Neural Networks For Wild Animal Classification".

deep-learning fusion-techniques metadata metadata-fusion multi-modal multi-modal-learning wild-animal-classification wild-life-monitoring

Last synced: 17 Nov 2024

https://github.com/stifler7/multi-modal-learning-for-image-and-text-analysis

Develops approaches for jointly analyzing images and text using deep learning. Covers applications like image-text matching, visual question answering, image captioning, and sentiment analysis with visual context.

machine-learning multi-modal-learning

Last synced: 01 Dec 2024