Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with multimodal-deep-learning
A curated list of projects in awesome lists tagged with multimodal-deep-learning .
https://github.com/salesforce/lavis
LAVIS - A One-stop Library for Language-Vision Intelligence
deep-learning deep-learning-library image-captioning multimodal-datasets multimodal-deep-learning salesforce vision-and-language vision-framework vision-language-pretraining vision-language-transformer visual-question-anwsering
Last synced: 02 Aug 2024
https://github.com/kimmeen/time-llm
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"
cross-modal-learning cross-modality deep-learning language-model large-language-models machine-learning multimodal-deep-learning multimodal-time-series prompt-tuning time-series time-series-analysis time-series-forecast time-series-forecasting
Last synced: 30 Sep 2024
https://github.com/dwctod/cvpr2024-papers-with-code-demo
收集 CVPR 最新的成果,包括论文、代码和demo视频等,欢迎大家推荐!Collect the latest CVPR (Conference on Computer Vision and Pattern Recognition) results, including papers, code, and demo videos, etc., and welcome recommendations from everyone!
computer-vision cvpr cvpr2021 cvpr2022 cvpr2023 cvpr2024 llm multimodal-deep-learning object-detection segment-anything segmentation
Last synced: 30 Sep 2024
https://github.com/kyegomez/BitNet
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
artificial-intelligence deep-neural-networks deeplearning gpt4 machine-learning multimodal multimodal-deep-learning
Last synced: 01 Aug 2024
https://github.com/jrzaurin/pytorch-widedeep
A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch
deep-learning images model-hub multimodal-deep-learning python pytorch pytorch-cv pytorch-nlp pytorch-tabular-data pytorch-transformers tabular-data text
Last synced: 30 Sep 2024
https://github.com/KimMeen/Time-LLM
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"
cross-modal-learning cross-modality deep-learning language-model large-language-models machine-learning multimodal-deep-learning multimodal-time-series prompt-tuning time-series time-series-analysis time-series-forecast time-series-forecasting
Last synced: 01 Aug 2024
https://github.com/AlibabaResearch/AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
artificial-intelligence computer-vision document document-analysis document-intelligence document-recognition document-understanding documentai end-to-end-ocr multimodal multimodal-deep-learning ocr scene-text-detection scene-text-detection-recognition scene-text-recognition text-detection text-recognition vision-language vision-language-model vision-language-transformer
Last synced: 01 Aug 2024
https://github.com/alibabaresearch/advancedliteratemachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
artificial-intelligence computer-vision document document-analysis document-intelligence document-recognition document-understanding documentai end-to-end-ocr multimodal multimodal-deep-learning ocr scene-text-detection scene-text-detection-recognition scene-text-recognition text-detection text-recognition vision-language vision-language-model vision-language-transformer
Last synced: 31 Jul 2024
https://github.com/declare-lab/multimodal-deep-learning
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
multimodal-deep-learning multimodal-interactions multimodal-learning multimodal-sentiment-analysis
Last synced: 02 Aug 2024
https://github.com/omriav/blended-latent-diffusion
Official implementation for "Blended Latent Diffusion" [SIGGRAPH 2023]
computer-vision deep-learning diffusion diffusion-models generative-model image-generation multimodal multimodal-deep-learning pytorch text-driven-editing text-guided-manipulation text-to-image text-to-image-synthesis
Last synced: 31 Jul 2024
https://github.com/theislab/scarches
Reference mapping for single-cell genomics
batch-correction data-integration deep-learning human-cell-atlas multimodal-deep-learning multiomics rna-seq-analysis scrna-seq single-cell single-cell-genomics
Last synced: 08 Aug 2024
https://github.com/fcakyon/content-moderation-deep-learning
Deep learning based content moderation from text, audio, video & image input modalities.
content-moderation content-ratings genre-classification movie-content-filter movie-trailer multimodal-deep-learning nsfw-recognition nudity-detection profanity-detection violence-detection
Last synced: 03 Oct 2024
https://github.com/westlake-repl/recommendation-systems-without-explicit-id-features-a-literature-review
Paper List of Pre-trained Foundation Recommender Models
chatgpt chatgpt3 chatgpt4rec cross-domain-recommendation cross-domainrecommendation foundation-model gpt4rec language-model large-language-model llm llm-recommendation llm4rec multimodal multimodal-deep-learning multimodalrecommendation pre-training recommendation-system recommender-system transfer-learning transferable
Last synced: 02 Aug 2024
https://github.com/kyegomez/Med-PaLM
Towards Generalist Biomedical AI
biomedical deep-learning gpt4 multimodal multimodal-deep-learning multimodality opensource
Last synced: 31 Jul 2024
https://github.com/sail-sg/CLoT
Official Codebase of our Paper: "Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation" (CVPR 2024)
association humor-generation large-language-models leap-of-thought multimodal-deep-learning
Last synced: 02 Aug 2024
https://github.com/mahmoodlab/MCAT
Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images - ICCV 2021
early-fusion genomics mahmoodlab mcat multimodal multimodal-deep-learning multimodal-fusion pathology
Last synced: 02 Aug 2024
https://github.com/kyegomez/the-compiler
Seed, Code, Harvest: Grow Your Own App with Tree of Thoughts!
agora artficial-intelligence autogpt chain-of-thought chatgpt deep-learning deep-learning-algorithms multi-modal-fusion multi-modality multimodal-deep-learning prompt-engineering reinforcement-learning tree-of-thoughts
Last synced: 09 Aug 2024
https://github.com/LeapLabTHU/Pseudo-Q
[CVPR 2022] Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
computer-vision cvpr2022 deep-learning multimodal-deep-learning pytorch vision-and-language visual-grounding
Last synced: 01 Aug 2024
https://github.com/cambridgeltl/visual-spatial-reasoning
[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.
computer-vision multimodal-deep-learning nlp vision-and-language
Last synced: 01 Aug 2024
https://github.com/aimotive/aimotive_dataset
aiMotive public dataset
3d-object-detection autonomous-driving dataset multimodal-deep-learning object-tracking representation-learning
Last synced: 31 Jul 2024
https://github.com/sutdcv/SUTD-TrafficQA
[CVPR2021] SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events
annotations cvpr cvpr2021 dataset multimodal multimodal-deep-learning paper traffic-events video-qa video-reasoning vqa vqa-dataset
Last synced: 31 Jul 2024
https://github.com/yuanze-lin/Learnable_Regions
Official implementation of the work "Text-Driven Image Editing via Learnable Regions" (CVPR 2024)
aigc diffusion-model diffusion-models generative-model multimodal-deep-learning text-driven-editing text-driven-image-editing text-driven-image-manipulation text-driven-manipulation text-image
Last synced: 31 Jul 2024
https://github.com/visinf/lnfmm
Latent Normalizing Flows for Many-to-Many Cross Domain Mappings (ICLR 2020)
conditional-vae generative-models image-to-text latent-variable-models multimodal-deep-learning normalizing-flows text-to-image vision-and-language
Last synced: 31 Jul 2024
https://github.com/42jaylonw/shifu
Lightweight Isaac Gym Environment Builder
isaacgym multimodal-deep-learning reinforcement-learning robot-learning robotics
Last synced: 01 Aug 2024
https://ai4ce.github.io/MARS/
[CVPR2024] Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset
3dgs collaborative-perception coperception cvpr2024 dataset multiagent multimodal-deep-learning nerf self-driving
Last synced: 01 Aug 2024
https://github.com/asnelt/mmae
Package for Multimodal Autoencoders in TensorFlow / Keras
autoencoder autoencoders bregman-distance deep-learning keras keras-models keras-tensorflow multimodal-deep-learning multimodal-learning tensorflow
Last synced: 27 Sep 2024
https://github.com/ai4ce/MARS
[CVPR2024] Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset
3dgs collaborative-perception coperception cvpr2024 dataset multiagent multimodal-deep-learning nerf self-driving
Last synced: 31 Jul 2024
https://github.com/macabdul9/torchmm
PyTorch Data loaders and abstraction for multi-modal data.
computer-vision multimodal-deep-learning natural-language-processing python pytorch speech-processing
Last synced: 01 Oct 2024