Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
awesome-multi-modal
The paper list in the multimodal domain.
https://github.com/zipzou/awesome-multi-modal
Last synced: 4 days ago
JSON representation
-
Textual Large Language Model Backbone
- this https URL
- https://github.com/TsinghuaAI/CPM-1-Generate
- https://arxiv.org/abs/2106.10715
- https://github.com/TsinghuaAI/CPM
- https://github.com/OpenBMB/MiniCPM
- https://arxiv.org/abs/2404.06395
- https://arxiv.org/abs/2403.04652
- https://github.com/THUDM/GLM
- https://arxiv.org/abs/2210.02414
- https://arxiv.org/abs/2210.11416
- https://arxiv.org/abs/2103.10360
- https://github.com/google-research/FLAN
- https://arxiv.org/abs/2302.13971
- https://github.com/meta-llama/llama/tree/llama_v1
- https://arxiv.org/abs/2307.09288
- https://github.com/meta-llama/llama
- https://lmsys.org/blog/2023-03-30-vicuna/
- https://github.com/lm-sys/FastChat
- https://crfm.stanford.edu/2023/03/13/alpaca.html
- https://github.com/tatsu-lab/stanford_alpaca
- https://arxiv.org/abs/2306.11644
- https://arxiv.org/abs/2309.16609
- https://arxiv.org/abs/2210.02414
- https://github.com/meta-llama/llama/tree/llama_v1
- https://arxiv.org/abs/2103.10360
- https://github.com/THUDM/GLM
- https://arxiv.org/abs/2210.11416
- https://arxiv.org/abs/2307.09288
- https://github.com/google-research/FLAN
- https://arxiv.org/abs/2302.13971
- https://github.com/meta-llama/llama
- https://lmsys.org/blog/2023-03-30-vicuna/
- https://github.com/lm-sys/FastChat
- https://crfm.stanford.edu/2023/03/13/alpaca.html
- https://github.com/tatsu-lab/stanford_alpaca
- https://arxiv.org/abs/2306.11644
- https://arxiv.org/abs/2309.16609
- https://github.com/QwenLM/Qwen
- https://github.com/QwenLM/Qwen
- https://arxiv.org/abs/2012.00413
- https://arxiv.org/abs/2012.00413
- this https URL
- https://github.com/TsinghuaAI/CPM-1-Generate
- https://arxiv.org/abs/2106.10715
- this https URL
- https://arxiv.org/abs/2404.06395
- this https URL
- https://arxiv.org/abs/2403.04652
- https://github.com/01-ai/Yi
- https://github.com/01-ai/Yi
-
Vision Model Backbone
- https://arxiv.org/abs/2010.11929
- https://github.com/google-research/vision_transformer
- https://arxiv.org/abs/2103.14030
- https://arxiv.org/abs/2010.11929
- https://github.com/google-research/vision_transformer
- https://arxiv.org/abs/2103.14030
- https://github.com/microsoft/Swin-Transformer
- https://github.com/OpenAI/CLIP
- https://arxiv.org/abs/2103.00020
- https://arxiv.org/abs/2304.07193
- https://arxiv.org/abs/2303.15343
- https://github.com/google-research/big_vision
- https://arxiv.org/abs/2303.15389
- https://github.com/baaivision/EVA
- https://github.com/baaivision/EVA/tree/master/EVA-CLIP
- https://arxiv.org/abs/2303.11331
- https://github.com/baaivision/EVA/tree/master/EVA-02
- https://arxiv.org/abs/2104.14294
- https://arxiv.org/abs/2303.05499
- https://github.com/IDEA-Research/GroundingDINO
- this https URL
- https://arxiv.org/abs/2103.00020
- this https URL
- https://arxiv.org/abs/2303.15343
- this https URL - image pre-training.
- https://arxiv.org/abs/2211.07636
- https://arxiv.org/abs/2304.07193
- https://arxiv.org/abs/2303.05499
- this https URL
- https://arxiv.org/abs/2303.15389
- this https URL
- https://arxiv.org/abs/2303.11331
- this https URL
- this https URL
- https://arxiv.org/abs/2104.14294
- https://github.com/facebookresearch/dino
- https://arxiv.org/abs/2304.02643
- this https URL
- https://github.com/facebookresearch/segment-anything
- https://arxiv.org/abs/2103.15691
- this https URL
- https://arxiv.org/abs/2307.06304
- https://arxiv.org/abs/2111.06377
- https://github.com/facebookresearch/mae
- https://arxiv.org/abs/2212.00794
- https://arxiv.org/abs/2304.02643
- this https URL
- https://github.com/facebookresearch/segment-anything
- https://arxiv.org/abs/2103.15691
- this https URL
- https://arxiv.org/abs/2307.06304
- https://arxiv.org/abs/2111.06377
- https://github.com/facebookresearch/mae
- https://arxiv.org/abs/2212.00794
-
Vision LLM for Generation
- https://arxiv.org/abs/2201.12086
- https://github.com/salesforce/BLIP
- https://arxiv.org/abs/2304.10592
- https://arxiv.org/abs/2301.12597
- https://github.com/salesforce/LAVIS
- this https URL
- https://arxiv.org/abs/2201.12086
- this https URL
- https://arxiv.org/abs/2304.10592
- https://arxiv.org/abs/2301.12597
- https://github.com/salesforce/LAVIS
- this https URL
- https://github.com/Vision-CAIR/MiniGPT-4;
- https://arxiv.org/abs/2304.08485
- https://github.com/haotian-liu/LLaVA
- https://arxiv.org/abs/2305.06500
- this https URL
- https://arxiv.org/abs/2304.14178
- https://github.com/X-PLUG/mPLUG-Owl
- https://arxiv.org/abs/2307.02469
- https://arxiv.org/abs/2311.04257
- https://github.com/Vision-CAIR/MiniGPT-4;
- https://arxiv.org/abs/2304.08485
- https://github.com/haotian-liu/LLaVA
- https://arxiv.org/abs/2305.06500
- this https URL
- https://arxiv.org/abs/2304.14178
- at this https URL - Owl).
- https://arxiv.org/abs/2311.04257
- https://arxiv.org/abs/2307.02469
- https://github.com/bytedance/lynx-llm
- https://github.com/Yuliang-Liu/Monkey
- https://arxiv.org/abs/2308.12966
- https://github.com/QwenLM/Qwen-VL
- https://github.com/thunlp/LLaVA-UHD
- https://arxiv.org/abs/2403.11703
- https://llava-vl.github.io/blog/2024-01-30-llava-next/
- https://github.com/bytedance/lynx-llm
- this https URL
- https://arxiv.org/abs/2308.12966
- this https URL
- https://arxiv.org/abs/2403.11703
- this https URL
- https://llava-vl.github.io/blog/2024-01-30-llava-next/
- https://arxiv.org/abs/2401.02330
- https://github.com/zhuyiche/llava-phi
- https://arxiv.org/abs/2403.09611
- https://github.com/AILab-CVC/SEED-X
- https://arxiv.org/abs/2312.14238
- https://github.com/OpenGVLab/InternVL
- https://arxiv.org/abs/2204.14198
- https://arxiv.org/abs/2308.01390
- https://github.com/mlfoundations/open_flamingo
- https://arxiv.org/abs/2403.06199
- https://arxiv.org/abs/2311.03079
- https://github.com/THUDM/CogVLM
- https://arxiv.org/abs/2404.14396
- https://arxiv.org/abs/2310.01218
- https://arxiv.org/abs/2401.02330
- {this https URL}
- https://arxiv.org/abs/2312.14238
- this https URL
- https://arxiv.org/abs/2403.09611
- https://arxiv.org/abs/2204.14198
- https://arxiv.org/abs/2308.01390
- this https URL
- https://arxiv.org/abs/2311.03079
- this https URL
- https://arxiv.org/abs/2403.06199
- https://arxiv.org/abs/2310.01218
- https://github.com/AILab-CVC/SEED-X
- https://arxiv.org/abs/2404.14396
-
Image Generation
-
GAN Paradigm
-
Augoregressive or MLM Paradigm in Discrete Space
- https://arxiv.org/abs/2406.06525
- https://github.com/FoundationVision/LlamaGen
- https://arxiv.org/abs/2406.06525
- https://github.com/FoundationVision/LlamaGen
- https://arxiv.org/abs/2206.10789
- https://arxiv.org/abs/2105.13290
- https://github.com/THUDM/CogView
- https://arxiv.org/abs/2204.14217
- https://github.com/google-research/maskgit
- https://arxiv.org/abs/2404.02905
- https://github.com/FoundationVision/VAR
- https://arxiv.org/abs/2406.11838
- https://arxiv.org/abs/2206.10789
- https://arxiv.org/abs/2105.13290
- https://github.com/THUDM/CogView
- https://arxiv.org/abs/2204.14217
- https://arxiv.org/abs/2202.04200
- https://github.com/google-research/maskgit
- https://arxiv.org/abs/2404.02905
- https://github.com/FoundationVision/VAR
- https://arxiv.org/abs/2406.11838
-
Diffusion Paradigm
- https://arxiv.org/abs/2006.11239
- https://github.com/hojonathanho/diffusion
- https://arxiv.org/abs/2212.09748
- https://github.com/facebookresearch/DiT
- https://arxiv.org/abs/2006.11239
- this https URL
- https://arxiv.org/abs/2112.10752
- https://github.com/CompVis/latent-diffusion
- https://arxiv.org/abs/2112.10752
- https://github.com/CompVis/latent-diffusion
- https://arxiv.org/abs/2212.09748
- https://github.com/facebookresearch/DiT
-
Programming Languages
Categories
Sub Categories
Keywords
large-language-models
10
deep-learning
8
pytorch
6
vision-language-transformer
6
pretrained-models
6
gpt
6
vision-language-model
6
llm
6
llama
6
language-model
6
multimodal
4
instruction-tuning
4
image-text-retrieval
4
chatgpt
4
image-captioning
4
vision-language
4
chatbot
4
vision-transformer
4
foundation-models
4
multi-modal
4
semantic-segmentation
4
object-detection
4
diffusion-models
4
image-generation
4
transformers
4
auto-regressive-model
4
image-classification
4
deep-learning-library
2
multimodal-datasets
2
multimodal-deep-learning
2
salesforce
2
vision-and-language
2
vision-framework
2
flash-attention
2
vision-language-pretraining
2
chinese
2
instruction-following
2
visual-question-anwsering
2
visual-reasoning
2
visual-question-answering
2
natural-language-processing
2
vision-and-language-pre-training
2
ade20k
2
imagenet
2
open-world-detection
2
mask-rcnn
2
open-world
2
mscoco
2
representation-learning
2
machine-learning
2