Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

awesome-vision-language-pretraining

Awesome Vision-Language Pretraining Papers
https://github.com/fawazsammani/awesome-vision-language-pretraining

Last synced: 1 day ago
JSON representation

  • Papers

    • VLMo
    • METER
    • WenLan
    • InterBERT
    • SemVLP
    • E2E-VLP
    • VinVL
    • UFO
    • Florence
    • VILLA
    • TDEN
    • ERNIE-ViL
    • Vokenization
    • 12-in-1 - multi-task)
    • KVL-BERT
    • Oscar
    • VIVO
    • SOHO
    • Pixel-BERT
    • VLKD
    • LightningDOT
    • VirTex
    • Uni-Perceiver - Perceiver)
    • Uni-Perceiver v2 - Perceiver)
    • DiHT
    • VL-Match
    • Prismer
    • PaLM-E - e.github.io/) [[blog]](https://ai.googleblog.com/2023/03/palm-e-embodied-multimodal-language.html)
    • [pdf
    • [pdf
    • [pdf
    • [pdf
    • [pdf
    • [pdf
    • [pdf
    • [pdf
    • [pdf
    • [pdf
    • [pdf
    • [pdf
    • [pdf
    • [pdf
    • [pdf
    • [pdf
    • [pdf
    • [pdf
    • M3AE - geng/m3ae_public)
    • MaskVLM
    • DALL-E-2 - pytorch) [[website]](https://openai.com/dall-e-2/) [[blog]](http://adityaramesh.com/posts/dalle2/dalle2.html) [[blog]](https://www.assemblyai.com/blog/how-dall-e-2-actually-works/) [[blog]](https://medium.com/augmented-startups/how-does-dall-e-2-work-e6d492a2667f)
    • KOSMOS-1
    • VILA - Large-Model/VILA) [[hf page]](https://huggingface.co/Efficient-Large-Model)
    • Plug and Pray
    • LEMON
    • IC3 - by-committee)
    • TAP
    • PICa
    • CVLP
    • ViLBERT - multi-task) [[code]](https://github.com/jiasenlu/vilbert_beta)
    • Unified-VLP
    • ImageBERT
    • SimVLM
    • ALBEF
    • LXMERT
    • X-LXMERT - lxmert)
    • VisualBERT
    • UNIMO
    • UNIMO-2 - ptm.github.io/)
    • BLIP - VQA-models)
    • Uni-EDEN
    • VisualGPT - CAIR/VisualGPT)
    • MiniVLM
    • XGPT
    • ViTCAP
    • UniT
    • VL-BERT - BERT)
    • Unicoder-VL
    • UNITER
    • ViLT - vqa)
    • GLIPv2
    • CoCa - pytorch) [[code]](https://github.com/mlfoundations/open_clip) [[colab]](https://colab.research.google.com/github/mlfoundations/open_clip/blob/master/docs/Interacting_with_open_coca.ipynb)
    • Flamingo - pytorch) [[code]](https://github.com/mlfoundations/open_flamingo) [[code]](https://github.com/dhansmair/flamingo-mini) [[website]](https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model) [[blog]](https://wandb.ai/gladiator/Flamingo%20VLM/reports/DeepMind-Flamingo-A-Visual-Language-Model-for-Few-Shot-Learning--VmlldzoyOTgzMDI2) [[blog]](https://laion.ai/blog/open-flamingo/) [[blog]](https://laion.ai/blog/open-flamingo-v2/)
    • BEiT-3
    • UniCL
    • UVLP
    • OFA - Sys/OFA) [[models and demos]](https://huggingface.co/OFA-Sys)
    • GPV-1 - 1/) [[website]](https://prior.allenai.org/projects/gpv)
    • GPV-2
    • TCL - smile/TCL)
    • L-Verse
    • FLAVA - model.github.io/) [[tutorial]](https://pytorch.org/tutorials/beginner/flava_finetuning_tutorial.html)
    • COTS
    • VL-ADAPTER
    • Unified-IO - io.allenai.org/)
    • ViLTA
    • CapDet
    • PTP - sg/ptp)
    • X-VLM - 97/x-vlm)
    • FewVLM
    • M3AE - geng/m3ae_public)
    • CFM-ViT
    • mPLUG
    • PaLI - scaling-language-image-learning-in.html)
    • GIT - VQA-models)
    • MaskVLM
    • DALL-E - E) [[code]](https://github.com/borisdayma/dalle-mini) [[code]](https://github.com/lucidrains/DALLE-pytorch) [[code]](https://github.com/kuprel/min-dalle) [[code]](https://github.com/robvanvolt/DALLE-models) [[code]](https://github.com/kakaobrain/minDALL-E) [[website]](https://openai.com/blog/dall-e/) [[video]](https://www.youtube.com/watch?v=j4xgkjWlfL4&t=1432s&ab_channel=YannicKilcher) [[video]](https://www.youtube.com/watch?v=jMqLTPcA9CQ&t=1034s&ab_channel=TheAIEpiphany) [[video]](https://www.youtube.com/watch?v=x_8uHX5KngE&ab_channel=TheAIEpiphany) [[blog]](https://wandb.ai/dalle-mini/dalle-mini/reports/DALL-E-mini--Vmlldzo4NjIxODA) [[blog]](https://wandb.ai/dalle-mini/dalle-mini/reports/DALL-E-mini-Generate-images-from-any-text-prompt--VmlldzoyMDE4NDAy) [[blog]](https://wandb.ai/dalle-mini/dalle-mini/reports/Building-efficient-image-input-pipelines--VmlldzoyMjMxOTQw) [[blog]](https://ml.berkeley.edu/blog/posts/vq-vae/) [[blog]](https://ml.berkeley.edu/blog/posts/dalle2/) [[blog]](https://towardsdatascience.com/understanding-how-dall-e-mini-works-114048912b3b)
    • DALL-E-2 - pytorch) [[website]](https://openai.com/dall-e-2/) [[blog]](http://adityaramesh.com/posts/dalle2/dalle2.html) [[blog]](https://www.assemblyai.com/blog/how-dall-e-2-actually-works/) [[blog]](https://medium.com/augmented-startups/how-does-dall-e-2-work-e6d492a2667f)
    • DALL-E 3 - e-3)
    • GigaGAN - pytorch) [[code]](https://github.com/jianzhnie/GigaGAN) [[website]](https://mingukkang.github.io/GigaGAN/)
    • Parti - research/parti) [[code]](https://github.com/lucidrains/parti-pytorch) [[video]](https://www.youtube.com/watch?v=qS-iYnp00uc&ab_channel=YannicKilcher) [[blog]](https://parti.research.google/)
    • Paella
    • Make-A-Scene
    • Make-A-Video - a-video-pytorch) [[blog]](https://makeavideo.studio/) [[blog]](https://ai.facebook.com/blog/generative-ai-text-to-video/) [[video]](https://www.youtube.com/watch?v=AcvmyqGgMh8&ab_channel=AICoffeeBreakwithLetitia) [[video]](https://www.youtube.com/watch?v=MmAJk2BD6WA)
    • FIBER
    • VL-BEiT - beit)
    • MetaLM
    • VL-T5 - min/VL-T5)
    • UNICORN
    • MI2P
    • MDETR
    • VLMixer
    • ViCHA
    • StoryDALL-E
    • VLMAE
    • MLIM
    • MOFI
    • Multimodal-CoT - science/mm-cot)
    • GILL
    • Language Pivoting
    • Graph-Align
    • PL-UIC
    • SCL
    • TaskRes
    • EPIC
    • HAAV
    • FLM
    • [pdf
    • X-Decoder - Decoder/tree/main) [[xgpt code]](https://github.com/microsoft/X-Decoder/tree/xgpt) [[website]](https://x-decoder-vl.github.io/) [[demo]](https://huggingface.co/spaces/xdecoder/Demo) [[demo]](https://huggingface.co/spaces/xdecoder/Instruct-X-Decoder)
    • PerVL
    • TextManiA - ami/TextManiA) [[website]](https://moon-yb.github.io/TextManiA.github.io/) [[GAN Inversion]](https://arxiv.org/pdf/2004.00049.pdf)
    • Cola
    • K-LITE
    • SINC
    • Visual ChatGPT - chatgpt)
    • CM3Leon
    • KOSMOS-1
    • MultiModal-GPT - mmlab/Multimodal-GPT)
    • LLaVA+
    • LLaVA-Interactive - VL/LLaVA-Interactive-Demo) [[website]](https://llava-vl.github.io/llava-interactive/) [[demo]](https://llavainteractive.ngrok.io/)
    • NExT-Chat - ChatV/NExT-Chat) [[demo]](https://516398b33beb3e8b9f.gradio.live/) [[website]](https://next-chatv.github.io/)
    • MiniGPT-4 - CAIR/MiniGPT-4) [[website]](https://minigpt-4.github.io/) [[demo]](https://huggingface.co/spaces/Vision-CAIR/minigpt4)
    • MiniGPT-v2 - CAIR/MiniGPT-4) [[website]](https://minigpt-v2.github.io/) [[demo]](https://876a8d3e814b8c3a8b.gradio.live/) [[demo]](https://huggingface.co/spaces/Vision-CAIR/MiniGPT-v2)
    • LLaMA-Adapter - Adapter)
    • LLaMA-Adapter V2 - Adapter) [[demo]](http://llama-adapter.opengvlab.com/)
    • LaVIN
    • InstructBLIP - Tutorials/blob/master/InstructBLIP/Inference_with_InstructBLIP.ipynb)
    • Otter/MIMIC-IT - ntu.github.io/)
    • CogVLM
    • ImageBind
    • TextBind
    • MetaVL
    • Instruction-ViT
    • MultiInstruct - NLP/MultiInstruct)
    • VisIT-Bench - Bench/) [[website]](https://visit-bench.github.io/) [[blog]](https://laion.ai/blog/visit_bench/) [[dataset]](https://huggingface.co/datasets/mlfoundations/VisIT-Bench) [[leaderboard]](https://huggingface.co/spaces/mlfoundations/VisIT-Bench-Leaderboard)
    • GPT4RoI
    • PandaGPT - gpt.github.io/)
    • ChatBridge - chatbridge.github.io/)
    • Video-LLaMA - NLP-SG/Video-LLaMA) [[demo]](https://huggingface.co/spaces/DAMO-NLP-SG/Video-LLaMA)
    • VideoChat - Anything)
    • InternGPT
    • mPLUG-Owl - PLUG/mPLUG-Owl) [[demo]](https://huggingface.co/spaces/MAGAer13/mPLUG-Owl)
    • VisionLLM - news/reports/Introducing-VisionLLM-A-New-Method-for-Multi-Modal-LLM-s--Vmlldzo0NTMzNzIz)
    • X-LLM - LLM) [[website]](https://x-llm.github.io/)
    • OBELICS - 80b) [[instruct model 9b]](https://huggingface.co/HuggingFaceM4/idefics-9b-instruct) [[instruct model 80b]](https://huggingface.co/HuggingFaceM4/idefics-80b-instruct) [[demo]](https://huggingface.co/spaces/HuggingFaceM4/idefics_playground) [[dataset]](https://huggingface.co/datasets/HuggingFaceM4/OBELICS)
    • EvALign-ICL - ICL) [[website]](https://evalign-icl.github.io/)
    • Plug and Pray
    • VL-PET - PET)
    • ICIS
    • NExT-GPT - GPT/NExT-GPT) [[website]](https://next-gpt.github.io/) [[demo]](https://4271670c463565f1a4.gradio.live/)
    • UnIVAL
    • BUS
    • VPGTrans
    • PromptCap - Hu/PromptCap) [[website]](https://yushi-hu.github.io/promptcap_demo/) [[hf checkpoint]](https://huggingface.co/tifa-benchmark/promptcap-coco-vqa)
    • P-Former
    • TL;DR
    • PMA-Net - Net)
    • Encyclopedic VQA - research/google-research/tree/master/encyclopedic_vqa)
    • CMOTA
    • CPT
    • TeS
    • MP - Probing)
    • LLaVA - liu/LLaVA) [[website]](https://llava-vl.github.io/) [[hf notebook]](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LLaVa/Inference_with_LLaVa_for_multimodal_generation.ipynb) [[hf docs]](https://huggingface.co/docs/transformers/model_doc/llava)
    • Graph-Align
    • M³IT - it.github.io/)
    • GLIP
    • MDETR
    • METER
    • PerVL
    • GLIP
    • Unified-IO - io.allenai.org/)
    • CapDet
    • GIT - VQA-models)
    • StoryDALL-E
    • VIVO
    • GPV-2
    • SmolVLM - Instruct) [[finetune demo]](https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb) [[demo]](https://huggingface.co/spaces/HuggingFaceTB/SmolVLM)
    • AIMV2 - aim) [[hf collection]](https://huggingface.co/collections/apple/aimv2-6720fe1558d94c7805f7688c)
    • ViLT - vqa)
    • Florence
    • LLaVA Series - Tutorials/blob/master/LLaVa/Inference_with_LLaVa_for_multimodal_generation.ipynb) [[LLaVA-NeXT]](https://github.com/LLaVA-VL/LLaVA-NeXT/) [[hf docs]](https://huggingface.co/docs/transformers/en/model_doc/llava_next) [[hf docs]](https://huggingface.co/docs/transformers/main/en/model_doc/llava_onevision) [[demo]](https://huggingface.co/spaces/merve/llava-next) [[hf card]](https://huggingface.co/llava-hf) [[LLaVA-CoT]](https://github.com/PKU-YuanGroup/LLaVA-CoT)
    • Qwen2-VL - VL) [[hf card]](https://huggingface.co/collections/Qwen/qwen2-vl-66cee7455501d7126940800d) [[demo]](https://huggingface.co/spaces/Qwen/Qwen2-VL)
    • LEMON
    • TAP
    • VILLA
    • [pdf
    • Unified-VLP
    • TDEN
    • Oscar
    • GLIPv2
    • UNIMO-2 - ptm.github.io/)
    • BLIP-2 - Tutorials/tree/master/BLIP-2) [[hf docs]](https://huggingface.co/docs/transformers/model_doc/blip-2) [[finetuning colab]](https://colab.research.google.com/drive/16XbIysCzgpAld7Kd9-xz-23VPWmqdWmW?usp=sharing) [[blog]](https://huggingface.co/blog/blip-2)
    • Uni-EDEN
    • MiniVLM
    • BLIP-2 - Tutorials/tree/master/BLIP-2) [[hf docs]](https://huggingface.co/docs/transformers/model_doc/blip-2) [[finetuning colab]](https://colab.research.google.com/drive/16XbIysCzgpAld7Kd9-xz-23VPWmqdWmW?usp=sharing) [[blog]](https://huggingface.co/blog/blip-2)
    • KOSMOS-2 - 2) [[code]](https://huggingface.co/docs/transformers/model_doc/kosmos-2) [[hf notebook]](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/KOSMOS-2/Inference_with_KOSMOS_2_for_multimodal_grounding.ipynb) [[demo]](https://huggingface.co/spaces/ydshieh/Kosmos-2)
    • ViP-LLaVA - cai/vip-llava) [[demo]](https://pages.cs.wisc.edu/~mucai/vip-llava.html) [[website]](https://vip-llava.github.io/) [[hf notebook]](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/ViP-LLaVa/Inference_with_ViP_LLaVa_for_fine_grained_VQA.ipynb) [[hf docs]](https://huggingface.co/docs/transformers/model_doc/vipllava)
    • VILA - Large-Model/VILA) [[hf page]](https://huggingface.co/Efficient-Large-Model)
    • MiniGPT-4 - CAIR/MiniGPT-4) [[website]](https://minigpt-4.github.io/) [[demo]](https://huggingface.co/spaces/Vision-CAIR/minigpt4)
    • LLaMA-Adapter - Adapter)
    • ImageBERT
    • SimVLM
    • XGPT
    • ViTCAP
    • IC3 - by-committee)
    • CVLP
    • VL-BERT - BERT)
    • WenLan
    • E2E-VLP
    • UFO
    • ERNIE-ViL
    • Vokenization
    • SOHO
    • Pixel-BERT
    • VirTex
    • Uni-Perceiver - Perceiver)
    • Uni-Perceiver v2 - Perceiver)
    • UniCL
    • UVLP
    • GPV-1 - 1/) [[website]](https://prior.allenai.org/projects/gpv)
    • TCL - smile/TCL)
    • FLAVA - model.github.io/) [[tutorial]](https://pytorch.org/tutorials/beginner/flava_finetuning_tutorial.html)
    • COTS
    • ViLTA
    • PTP - sg/ptp)
    • FewVLM
    • CFM-ViT
    • VL-BEiT - beit)
    • mPLUG
    • PaLI - scaling-language-image-learning-in.html) [[code]](https://github.com/kyegomez/PALI3)
    • VL-T5 - min/VL-T5)
    • UNICORN
    • VLMixer
    • Img2LLM - vqa) [[colab]](https://colab.research.google.com/github/salesforce/LAVIS/blob/main/projects/img2llm-vqa/img2llm_vqa.ipynb)
    • VLMAE
    • MLIM
    • MOFI
    • GILL
    • PL-UIC
    • SCL
    • HAAV
    • FLM
    • DiHT
    • Prismer
    • TextManiA - ami/TextManiA) [[website]](https://moon-yb.github.io/TextManiA.github.io/) [[GAN Inversion]](https://arxiv.org/pdf/2004.00049.pdf)
    • K-LITE
    • SINC
    • Visual ChatGPT - chatgpt)
    • KOSMOS-2 - 2) [[code]](https://huggingface.co/docs/transformers/model_doc/kosmos-2) [[hf notebook]](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/KOSMOS-2/Inference_with_KOSMOS_2_for_multimodal_grounding.ipynb) [[demo]](https://huggingface.co/spaces/ydshieh/Kosmos-2)
    • MultiModal-GPT - mmlab/Multimodal-GPT)
    • MiniGPT-v2 - CAIR/MiniGPT-4) [[website]](https://minigpt-v2.github.io/) [[demo]](https://876a8d3e814b8c3a8b.gradio.live/) [[demo]](https://huggingface.co/spaces/Vision-CAIR/MiniGPT-v2)
    • LLaMA-Adapter V2 - Adapter) [[demo]](http://llama-adapter.opengvlab.com/)
    • InstructBLIP - Tutorials/blob/master/InstructBLIP/Inference_with_InstructBLIP.ipynb)
    • Otter/MIMIC-IT - ntu.github.io/)
    • CogVLM
    • ImageBind
    • Instruction-ViT
    • VisIT-Bench - Bench/) [[website]](https://visit-bench.github.io/) [[blog]](https://laion.ai/blog/visit_bench/) [[dataset]](https://huggingface.co/datasets/mlfoundations/VisIT-Bench) [[leaderboard]](https://huggingface.co/spaces/mlfoundations/VisIT-Bench-Leaderboard)
    • PandaGPT - gpt.github.io/)
    • Video-LLaMA - NLP-SG/Video-LLaMA) [[demo]](https://huggingface.co/spaces/DAMO-NLP-SG/Video-LLaMA)
    • VisionLLM - news/reports/Introducing-VisionLLM-A-New-Method-for-Multi-Modal-LLM-s--Vmlldzo0NTMzNzIz)
    • X-LLM - LLM) [[website]](https://x-llm.github.io/)
    • VL-PET - PET)
    • ICIS
    • BUS
    • VPGTrans
    • X-LXMERT - lxmert)
    • UNIMO
    • Img2LLM - vqa) [[colab]](https://colab.research.google.com/github/salesforce/LAVIS/blob/main/projects/img2llm-vqa/img2llm_vqa.ipynb)
    • PNP-VQA - vqa) [[colab]](https://colab.research.google.com/github/salesforce/LAVIS/blob/main/projects/pnp-vqa/pnp_vqa.ipynb)
    • UniT
    • TextBind
    • EvALign-ICL - ICL) [[website]](https://evalign-icl.github.io/)
    • CMOTA
    • Prophet
    • GRiT
    • KOSMOS-2 - 2) [[hf docs]](https://huggingface.co/docs/transformers/model_doc/kosmos-2) [[hf notebook]](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/KOSMOS-2/Inference_with_KOSMOS_2_for_multimodal_grounding.ipynb) [[demo]](https://huggingface.co/spaces/ydshieh/Kosmos-2)
    • InternVL
    • MiniCPM-V - V)
    • LLaVA-MORE
    • Qwen-VL - VL) [[tutorial]](https://github.com/QwenLM/Qwen-VL/blob/master/TUTORIAL.md) [[blog]](https://qwenlm.github.io/blog/qwen-vl/) [[blog]](https://qwenlm.github.io/blog/qwen2-vl/)
    • M³IT - it.github.io/)
    • InterBERT
    • VinVL
    • Flamingo - pytorch) [[code]](https://github.com/mlfoundations/open_flamingo) [[code]](https://github.com/dhansmair/flamingo-mini) [[website]](https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model) [[blog]](https://wandb.ai/gladiator/Flamingo%20VLM/reports/DeepMind-Flamingo-A-Visual-Language-Model-for-Few-Shot-Learning--VmlldzoyOTgzMDI2) [[blog]](https://laion.ai/blog/open-flamingo/) [[blog]](https://laion.ai/blog/open-flamingo-v2/)
    • ChatBridge - chatbridge.github.io/)
    • SemVLP
    • EPIC
    • PaLM-E - e.github.io/) [[blog]](https://ai.googleblog.com/2023/03/palm-e-embodied-multimodal-language.html)
    • LaVIN
    • ALBEF
    • BLIP - VQA-models)
    • VLMo
    • PICa
    • 12-in-1 - multi-task)
    • CoCa - pytorch) [[code]](https://github.com/mlfoundations/open_clip) [[colab]](https://colab.research.google.com/github/mlfoundations/open_clip/blob/master/docs/Interacting_with_open_coca.ipynb)
    • BEiT-3
    • VL-ADAPTER
    • MetaLM
    • DALL-E - E) [[code]](https://github.com/borisdayma/dalle-mini) [[code]](https://github.com/lucidrains/DALLE-pytorch) [[code]](https://github.com/kuprel/min-dalle) [[code]](https://github.com/robvanvolt/DALLE-models) [[code]](https://github.com/kakaobrain/minDALL-E) [[website]](https://openai.com/blog/dall-e/) [[video]](https://www.youtube.com/watch?v=j4xgkjWlfL4&t=1432s&ab_channel=YannicKilcher) [[video]](https://www.youtube.com/watch?v=jMqLTPcA9CQ&t=1034s&ab_channel=TheAIEpiphany) [[video]](https://www.youtube.com/watch?v=x_8uHX5KngE&ab_channel=TheAIEpiphany) [[blog]](https://wandb.ai/dalle-mini/dalle-mini/reports/DALL-E-mini--Vmlldzo4NjIxODA) [[blog]](https://wandb.ai/dalle-mini/dalle-mini/reports/DALL-E-mini-Generate-images-from-any-text-prompt--VmlldzoyMDE4NDAy) [[blog]](https://wandb.ai/dalle-mini/dalle-mini/reports/Building-efficient-image-input-pipelines--VmlldzoyMjMxOTQw) [[blog]](https://ml.berkeley.edu/blog/posts/vq-vae/) [[blog]](https://ml.berkeley.edu/blog/posts/dalle2/) [[blog]](https://towardsdatascience.com/understanding-how-dall-e-mini-works-114048912b3b)
    • Language Pivoting
    • TaskRes
    • MultiInstruct - NLP/MultiInstruct)
    • [pdf
    • PNP-VQA - vqa) [[colab]](https://colab.research.google.com/github/salesforce/LAVIS/blob/main/projects/pnp-vqa/pnp_vqa.ipynb)
    • DeepSeek-VL - ai/DeepSeek-VL) [[hf page]](https://huggingface.co/deepseek-ai)
    • Cambrian-1 - mllm/cambrian) [[hf models]](https://huggingface.co/collections/nyu-visionx/cambrian-1-models-666fa7116d5420e514b0f23c) [[website]](https://cambrian-mllm.github.io/)
    • PaliGemma 2 - research/big_vision/blob/main/big_vision/configs/proj/paligemma/README.md) [[hf docs]](https://huggingface.co/docs/transformers/main/en/model_doc/paligemma) [[blog]](https://huggingface.co/blog/paligemma2)
    • UnIVAL
    • LLaVA - liu/LLaVA) [[website]](https://llava-vl.github.io/) [[hf notebook]](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LLaVa/Inference_with_LLaVa_for_multimodal_generation.ipynb) [[hf docs]](https://huggingface.co/docs/transformers/model_doc/llava)
    • ViP-LLaVA - cai/vip-llava) [[demo]](https://pages.cs.wisc.edu/~mucai/vip-llava.html) [[website]](https://vip-llava.github.io/) [[hf notebook]](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/ViP-LLaVa/Inference_with_ViP_LLaVa_for_fine_grained_VQA.ipynb) [[hf docs]](https://huggingface.co/docs/transformers/model_doc/vipllava)
    • MetaVL
    • GPT4RoI
    • VideoChat - Anything)
    • mPLUG-Owl - PLUG/mPLUG-Owl) [[demo]](https://huggingface.co/spaces/MAGAer13/mPLUG-Owl)
    • OBELICS - 80b) [[instruct model 9b]](https://huggingface.co/HuggingFaceM4/idefics-9b-instruct) [[instruct model 80b]](https://huggingface.co/HuggingFaceM4/idefics-80b-instruct) [[demo]](https://huggingface.co/spaces/HuggingFaceM4/idefics_playground) [[dataset]](https://huggingface.co/datasets/HuggingFaceM4/OBELICS)
    • NExT-GPT - GPT/NExT-GPT) [[website]](https://next-gpt.github.io/) [[demo]](https://4271670c463565f1a4.gradio.live/)
    • [pdf
    • [pdf
    • [pdf
    • [pdf
    • [pdf
    • ViCHA
    • KVL-BERT
    • PromptCap - Hu/PromptCap) [[website]](https://yushi-hu.github.io/promptcap_demo/) [[hf checkpoint]](https://huggingface.co/tifa-benchmark/promptcap-coco-vqa)
    • P-Former
    • TL;DR
    • PMA-Net - Net)
    • Encyclopedic VQA - research/google-research/tree/master/encyclopedic_vqa)
    • CPT
    • TeS
    • MP - Probing)
    • [pdf
    • [pdf
    • [pdf
    • [pdf
    • [pdf
    • [pdf
    • [pdf
    • [pdf
    • [pdf
  • Miscellaneous

    • [pdf
    • [pdf - instruct)
    • [code
    • [github - os) [[twitter post]](https://twitter.com/danielhanchen/status/1769550950270910630)
    • [github
    • x-transformers
    • [pdf
    • [pdf - playground-tgi) [[blog]](https://huggingface.co/blog/4bit-transformers-bitsandbytes)
    • [pdf
    • [pdf
    • [pdf - 120b) [[official demo]](https://www.galactica.org/) [[demo]](https://huggingface.co/spaces/lewtun/galactica-demo)
    • [pdf
    • [blog - 4) [[technical report]](https://arxiv.org/pdf/2303.08774.pdf)
    • [website - lab/stanford_alpaca) [[model]](https://huggingface.co/chavinlo/alpaca-13b) [[demo]](https://alpaca-ai.ngrok.io/) [[gpt4-x-alpaca]](https://huggingface.co/chavinlo/gpt4-x-alpaca)
    • [website - sys/FastChat) [[demo]](https://chat.lmsys.org/)
    • [pdf - teacher)
    • [pdf
    • [pdf
    • [pdf - transformers)
    • [pdf
    • [pdf - soups)
    • [pdf - priming)
    • [pdf
    • [pdf
    • [article
    • [pdf
    • Falcon LLM
    • [pdf - research/prompt-tuning) [[code]](https://huggingface.co/docs/peft/task_guides/clm-prompt-tuning) [[blog]](https://ai.googleblog.com/2022/02/guiding-frozen-language-models-with.html?m=1) [[blog]](https://heidloff.net/article/introduction-to-prompt-tuning/)
    • [pdf
    • [pdf - tuning)
    • [pdf - nlp/LM-BFF)
    • [pdf
    • [pdf - models-perform-reasoning-via.html?m=1)
    • [pdf - Edgerunners/Plan-and-Solve-Prompting)
    • [pdf
    • [pdf
    • [pdf
    • [pdf
    • [pdf - AI/visual_prompt_retrieval)
    • [pdf - Group/ILM-VP)
    • [pdf
    • [pdf - icl)
    • [pdf
    • [pdf - and-Respond) [[website]](https://uclaml.github.io/Rephrase-and-Respond/)
    • [pdf
    • [blog - following/) [[rlhf hf]](https://huggingface.co/blog/rlhf) [[rlhf wandb]](https://wandb.ai/ayush-thakur/RLHF/reports/Understanding-Reinforcement-Learning-from-Human-Feedback-RLHF-Part-1--VmlldzoyODk5MTIx) [[code]](https://github.com/lucidrains/PaLM-rlhf-pytorch)
    • [models
    • [github
    • [pdf - o?usp=sharing) [[video]](https://www.youtube.com/watch?v=KEv-F5UkhxU&ab_channel=AICoffeeBreakwithLetitia) [[blog]](https://huggingface.co/blog/peft) [[blog]](https://www.ml6.eu/blogpost/low-rank-adaptation-a-technical-deep-dive) [[blog]](https://medium.com/@abdullahsamilguser/lora-low-rank-adaptation-of-large-language-models-7af929391fee) [[blog]](https://towardsdatascience.com/understanding-lora-low-rank-adaptation-for-finetuning-large-models-936bce1a07c6) [[hf docs]](https://huggingface.co/docs/diffusers/training/lora) [[library]](https://huggingface.co/docs/peft/index) [[library tutorial]](https://huggingface.co/learn/cookbook/prompt_tuning_peft)
    • [pdf - python]](https://github.com/abetlen/llama-cpp-python) [[blog]](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/) [[video]](https://www.youtube.com/watch?v=E5OnoYF2oAk&t=1915s)
    • [pdf - llama) [[demo]](https://labs.perplexity.ai/) [[demo]](https://huggingface.co/chat) [[blog]](https://ai.meta.com/llama/) [[blog]](https://ai.meta.com/resources/models-and-libraries/llama/) [[blog]](https://huggingface.co/blog/llama2) [[blog]](https://www.philschmid.de/llama-2) [[blog]](https://medium.com/towards-generative-ai/understanding-llama-2-architecture-its-ginormous-impact-on-genai-e278cb81bd5c) [[llama2.c]](https://github.com/karpathy/llama2.c) [[finetune script]](https://gist.github.com/younesbelkada/9f7f75c94bdc1981c8ca5cc937d4a4da) [[finetune script]](https://www.philschmid.de/sagemaker-llama2-qlora) [[finetune script]](https://www.philschmid.de/instruction-tune-llama-2) [[tutorials]](https://github.com/amitsangani/Llama-2) [[yarn]](https://github.com/jquesnelle/yarn)
    • [models
    • [hf collections
    • [blog - release-65d5efbccdbb8c4202ec078b) [[bugs]](https://unsloth.ai/blog/gemma-bugs)
    • [pdf - inversion.github.io/) [[hf docs]](https://huggingface.co/docs/diffusers/training/text_inversion) [[blog]](https://medium.com/@onkarmishra/how-textual-inversion-works-and-its-applications-5e3fda4aa0bc)
    • [pdf - Stable-Diffusion) [[code]](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth) [[website]](https://dreambooth.github.io/) [[hf docs]](https://huggingface.co/docs/diffusers/training/dreambooth)
    • [pdf
    • [pdf - Group/ILM-VP)
    • [pdf - models-perform-reasoning-via.html?m=1)
    • [pdf - inversion.github.io/) [[hf docs]](https://huggingface.co/docs/diffusers/training/text_inversion) [[blog]](https://medium.com/@onkarmishra/how-textual-inversion-works-and-its-applications-5e3fda4aa0bc)
    • [pdf - llama) [[demo]](https://labs.perplexity.ai/) [[demo]](https://huggingface.co/chat) [[blog]](https://ai.meta.com/llama/) [[blog]](https://ai.meta.com/resources/models-and-libraries/llama/) [[blog]](https://huggingface.co/blog/llama2) [[blog]](https://www.philschmid.de/llama-2) [[blog]](https://medium.com/towards-generative-ai/understanding-llama-2-architecture-its-ginormous-impact-on-genai-e278cb81bd5c) [[llama2.c]](https://github.com/karpathy/llama2.c) [[finetune script]](https://gist.github.com/younesbelkada/9f7f75c94bdc1981c8ca5cc937d4a4da) [[finetune script]](https://www.philschmid.de/sagemaker-llama2-qlora) [[finetune script]](https://www.philschmid.de/instruction-tune-llama-2) [[tutorials]](https://github.com/amitsangani/Llama-2) [[yarn]](https://github.com/jquesnelle/yarn)
    • [pdf
    • [pdf - tuning)
    • [hf card - llama) [[hf docs]](https://huggingface.co/docs/transformers/en/model_doc/llama) [[hf docs]](https://huggingface.co/docs/transformers/en/model_doc/llama2) [[hf docs]](https://huggingface.co/docs/transformers/en/model_doc/llama3) [[llama report]](https://arxiv.org/pdf/2302.13971v1) [[llama2 report]](https://arxiv.org/pdf/2307.09288.pdf) [[llama3 report]](https://arxiv.org/pdf/2407.21783) [[llama3 report summary]](https://x.com/A_K_Nain/status/1815942598944547074) [[llama3.1 hf blog]](https://huggingface.co/blog/llama31) [[llama3.2 hf blog]](https://huggingface.co/blog/llama32) [[code llama report]](https://arxiv.org/pdf/2308.12950.pdf) Other Resources: [[llama2.c]](https://github.com/karpathy/llama2.c) [[llama.cpp]](https://github.com/ggerganov/llama.cpp) [[finetune script]](https://gist.github.com/younesbelkada/9f7f75c94bdc1981c8ca5cc937d4a4da) [[finetune script]](https://www.philschmid.de/sagemaker-llama2-qlora) [[finetune script]](https://www.philschmid.de/instruction-tune-llama-2) [[finetune script]](https://www.philschmid.de/fsdp-qlora-llama3) [[tutorials]](https://github.com/amitsangani/Llama-2) [[yarn]](https://github.com/jquesnelle/yarn) [[openbio]](https://huggingface.co/aaditya/Llama3-OpenBioLLM-70B) [[huggingface-llama-recipes]](https://github.com/huggingface/huggingface-llama-recipes)
    • [pdf - 6722be18cb86c20ebe113e95)
    • [pdf
    • [pdf
    • [pdf
    • [pdf - python]](https://github.com/abetlen/llama-cpp-python) [[blog]](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/) [[video]](https://www.youtube.com/watch?v=E5OnoYF2oAk&t=1915s)
    • [pdf
    • [pdf
    • [pdf
    • [pdf - teacher)
    • [pdf
    • [pdf - o?usp=sharing) [[video]](https://www.youtube.com/watch?v=KEv-F5UkhxU&ab_channel=AICoffeeBreakwithLetitia) [[blog]](https://huggingface.co/blog/peft) [[blog]](https://www.ml6.eu/blogpost/low-rank-adaptation-a-technical-deep-dive) [[blog]](https://medium.com/@abdullahsamilguser/lora-low-rank-adaptation-of-large-language-models-7af929391fee) [[blog]](https://towardsdatascience.com/understanding-lora-low-rank-adaptation-for-finetuning-large-models-936bce1a07c6) [[hf docs]](https://huggingface.co/docs/diffusers/training/lora) [[library]](https://huggingface.co/docs/peft/index) [[library tutorial]](https://huggingface.co/learn/cookbook/prompt_tuning_peft)
    • [pdf - playground-tgi) [[blog]](https://huggingface.co/blog/4bit-transformers-bitsandbytes)
    • [pdf
    • [pdf - transformers)
    • [pdf
    • [pdf - soups)
    • [pdf - priming)
    • [pdf
    • [pdf
    • [pdf - research/prompt-tuning) [[code]](https://huggingface.co/docs/peft/task_guides/clm-prompt-tuning) [[blog]](https://ai.googleblog.com/2022/02/guiding-frozen-language-models-with.html?m=1) [[blog]](https://heidloff.net/article/introduction-to-prompt-tuning/)
    • [pdf
    • [pdf - nlp/LM-BFF)
    • [pdf
    • [pdf - Edgerunners/Plan-and-Solve-Prompting)
    • [pdf - and-Respond) [[website]](https://uclaml.github.io/Rephrase-and-Respond/)
    • [pdf
    • [pdf
    • [pdf
    • [pdf - Stable-Diffusion) [[code]](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth) [[website]](https://dreambooth.github.io/) [[hf docs]](https://huggingface.co/docs/diffusers/training/dreambooth)
    • [pdf - AI/visual_prompt_retrieval)
    • [pdf - icl)
    • [mlabonne - fine-tuning-LLMs) [[train and deploy]](https://www.youtube.com/watch?v=Ma4clS-IdhA&t=1709s) [[supervised FT]](https://www.youtube.com/watch?v=NXevvEF3QVI&t=1s) [[how LLM Chatbots work]](https://www.youtube.com/watch?v=C6ZszXYPDDw&t=26s) [[finetuning tutorial]](https://www.philschmid.de/fine-tune-llms-in-2024-with-trl) [[pytorch finetuning tutorial]](https://pytorch.org/blog/finetune-llms/?utm_content=278057355&utm_medium=social&utm_source=linkedin&hss_channel=lcp-78618366) [[finetuning tutorial]](https://huggingface.co/learn/cookbook/fine_tuning_code_llm_on_single_gpu) [[hf tutorial]](https://www.youtube.com/watch?v=2-SPH9hIKT8) [[hf slides]](https://docs.google.com/presentation/d/1uFd95VFSefD_Pom12kZ6q7ZppBJuT-T1vSGMUojDaBQ/edit#slide=id.p) [[andrej karpathy tutorials]](https://www.youtube.com/@AndrejKarpathy/videos) [[hf tutorial]](https://www.youtube.com/watch?v=2-SPH9hIKT8) [[unsloth]](https://github.com/unslothai/unsloth) [[transformer changes]](https://x.com/Muhtasham9/status/1772469982485438485) [[llms from scratch]](https://github.com/rasbt/LLMs-from-scratch) [[finetuning llms course]](https://github.com/huggingface/smol-course)
    • FLIP
    • CLIP Varaints
    • awesome-clip
    • UPL
    • ProDA
    • CTP
    • OpenCLIP - AI/scaling-laws-openclip) [[clip colab]](https://colab.research.google.com/github/mlfoundations/open_clip/blob/master/docs/Interacting_with_open_clip.ipynb) [[clip benchmark]](https://github.com/LAION-AI/CLIP_benchmark) [[hf models]](https://huggingface.co/models?library=open_clip)
    • CLIPScore
    • LiT - research/vision_transformer) [[website]](https://google-research.github.io/vision_transformer/lit/)
    • SigLIP - research/big_vision) [[colab demo]](https://colab.research.google.com/github/google-research/big_vision/blob/main/big_vision/configs/proj/image_text/SigLIP_demo.ipynb)
    • ALIGN
    • DeCLIP - GVT/DeCLIP)
    • FLIP
    • Counting-aware CLIP - clip-to-count.github.io/)
    • ALIP
    • STAIR
    • FILIP
    • SLIP
    • WiSE-FT - ft)
    • FLYP
    • MAGIC
    • ZeroCap - shot-image-to-text)
    • CapDec - H?usp=sharing)
    • DeCap - wei/DeCap)
    • ViECap
    • CLOSE
    • xCLIP
    • EVA - CLIP]](https://arxiv.org/pdf/2303.15389.pdf) [[EVA-02]](https://arxiv.org/pdf/2303.11331.pdf) [[code]](https://github.com/baaivision/EVA)
    • VT-CLIP
    • CLIP-ViL - vil/CLIP-ViL)
    • RegionCLIP
    • DenseCLIP
    • E-CLIP
    • X-CLIP - CLIP) [[code]](https://huggingface.co/docs/transformers/model_doc/xclip)
    • MaskCLIP
    • CLIPSeg - zero-shot.ipynb) [[demo]](https://huggingface.co/spaces/nielsr/CLIPSeg) [[demo]](https://huggingface.co/spaces/Sijuade/CLIPSegmentation) [[demo]](https://huggingface.co/spaces/taesiri/CLIPSeg) [[demo]](https://huggingface.co/spaces/aryswisnu/CLIPSeg2) [[demo]](https://huggingface.co/spaces/aryswisnu/CLIPSeg) [[blog]](https://huggingface.co/blog/clipseg-zero-shot)
    • OWL-ViT - research/scenic/tree/main/scenic/projects/owl_vit) [[code]](https://huggingface.co/docs/transformers/model_doc/owlvit) [[code]](https://huggingface.co/docs/transformers/tasks/zero_shot_object_detection) [[colab]](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/zeroshot_object_detection_with_owlvit.ipynb) [[demo]](https://huggingface.co/spaces/adirik/OWL-ViT) [[demo]](https://huggingface.co/spaces/adirik/image-guided-owlvit) [[demo]](https://huggingface.co/spaces/johko/OWL-ViT) [[demo]](https://huggingface.co/spaces/kellyxiaowei/OWL-ViT) [[demo]](https://huggingface.co/spaces/wendys-llc/OWL-ViT)
    • ClipCap
    • VQGAN-CLIP - CLIP) [[code]](https://github.com/EleutherAI/vqgan-clip) [[code]](https://github.com/justinjohn0306/VQGAN-CLIP) [[code]](https://www.kaggle.com/code/basu369victor/playing-with-vqgan-clip/notebook) [[colab]](https://colab.research.google.com/github/dribnet/clipit/blob/master/demos/Moar_Settings.ipynb) [[colab]](https://colab.research.google.com/drive/1L8oL-vLJXVcRzCFbPwOoMkPKJ8-aYdPN) [[colab]](https://colab.research.google.com/github/justinjohn0306/VQGAN-CLIP/blob/main/VQGAN%2BCLIP(Updated).ipynb)
    • AltCLIP - Open/FlagAI) [[code]](https://huggingface.co/docs/transformers/model_doc/altclip)
    • CLIPPO
    • FDT
    • DIME-FM - FM) [[website]](https://cs-people.bu.edu/sunxm/DIME-FM/)
    • ViLLA
    • BASIC
    • CoOp
    • CoCoOp
    • RPO
    • KgCoOp
    • ECO
    • UPT
    • MVLPT
    • DAPT
    • LFA - fi/LFA)
    • LaFTer
    • TAP
    • CLIP-Adapter - Adapter)
    • Tip-Adapter - Adapter)
    • CALIP
    • CaFo
    • SHIP
    • LoGoPrompt
    • GRAM
    • MaPLe - prompt-learning)
    • PromptSR
    • ProGrad - align)
    • APE
    • CuPL
    • WaffleCLIP
    • R-AMT - AMT) [[website]](https://wuw2019.github.io/R-AMT/)
    • SVL-Adapter
    • KAPT
    • SuS-X - X)
    • Internet Explorer - explorer-ssl/internet-explorer) [[website]](https://internet-explorer-ssl.github.io/)
    • REACT - vl.github.io/)
    • SEARLE
    • CLIPpy
    • CDUL
    • DN - dev/distribution-normalization) [[website]](https://fengyuli-dev.github.io/dn-website/)
    • [pdf - menon/classify_by_description_release) [[website]](https://cv.cs.columbia.edu/sachit/classviadescr/)
    • [pdf - Adapter)
    • [pdf - tuning_ICCV_2023_supplemental.pdf) [[code]](https://github.com/SwinTransformer/Feature-Distillation)
    • [pdf
    • [pdf
    • [pdf
    • [pdf
    • [pdf
    • [pdf - CLIP)
    • [pdf - shot-video-to-text)
    • [pdf
    • [pdf
    • [pdf
    • [pdf - chefer/TargetCLIP)
    • [code - guided-diffusion) [[code]](https://github.com/nerdyrodent/CLIP-Guided-Diffusion) [[code]](https://github.com/crowsonkb/v-diffusion-pytorch)
    • [video
    • [pdf - menon/classify_by_description_release) [[website]](https://cv.cs.columbia.edu/sachit/classviadescr/)
    • CLIP - CLIP) [[code]](https://github.com/moein-shariatnia/OpenAI-CLIP) [[code]](https://github.com/lucidrains/x-clip) [[code]](https://github.com/huggingface/transformers/tree/main/examples/pytorch/contrastive-image-text) [[hf docs]](https://huggingface.co/docs/transformers/model_doc/clip) [[website]](https://openai.com/blog/clip/) [[video]](https://www.youtube.com/watch?v=T9XSU0pKX2E&t=1455s&ab_channel=YannicKilcher) [[video]](https://www.youtube.com/watch?v=fQyHEXZB-nM&ab_channel=AleksaGordi%C4%87-TheAIEpiphany) [[video code]](https://www.youtube.com/watch?v=jwZQD0Cqz4o&t=4610s&ab_channel=TheAIEpiphany) [[CLIP_benchmark]](https://github.com/LAION-AI/CLIP_benchmark) [[clip-retrieval]](https://github.com/rom1504/clip-retrieval) [[clip-retrieval blog]](https://rom1504.medium.com/semantic-search-with-embeddings-index-anything-8fb18556443c)
    • PromptKD
    • MirrorCLIP
    • WATT - Noori/WATT)
    • [pdf
    • Prompt Align
    • TPT
    • ReCLIP
    • PLOT
    • GEM
    • SynthCLIP
    • [pdf - Liang/Modality-Gap) [[website]](https://modalitygap.readthedocs.io/en/latest/)
    • [pdf
    • [pdf
    • SigLIP - research/big_vision) [[colab demo]](https://colab.research.google.com/github/google-research/big_vision/blob/main/big_vision/configs/proj/image_text/SigLIP_demo.ipynb) [[hf notebook]](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/SigLIP/Inference_with_(multilingual)_SigLIP%2C_a_better_CLIP_model.ipynb) [[hf docs]](https://huggingface.co/docs/transformers/model_doc/siglip) [[hf models]](https://huggingface.co/collections/google/siglip-659d5e62f0ae1a57ae0e83ba)
    • CLIP - CLIP) [[code]](https://github.com/moein-shariatnia/OpenAI-CLIP) [[code]](https://github.com/lucidrains/x-clip) [[code]](https://github.com/huggingface/transformers/tree/main/examples/pytorch/contrastive-image-text) [[hf docs]](https://huggingface.co/docs/transformers/model_doc/clip) [[website]](https://openai.com/blog/clip/) [[video]](https://www.youtube.com/watch?v=T9XSU0pKX2E&t=1455s&ab_channel=YannicKilcher) [[video]](https://www.youtube.com/watch?v=fQyHEXZB-nM&ab_channel=AleksaGordi%C4%87-TheAIEpiphany) [[video code]](https://www.youtube.com/watch?v=jwZQD0Cqz4o&t=4610s&ab_channel=TheAIEpiphany) [[CLIP_benchmark]](https://github.com/LAION-AI/CLIP_benchmark) [[clip-retrieval]](https://github.com/rom1504/clip-retrieval) [[clip-retrieval blog]](https://rom1504.medium.com/semantic-search-with-embeddings-index-anything-8fb18556443c)
    • ZeroCap - shot-image-to-text)
    • CLOSE
    • CuPL
    • Internet Explorer - explorer-ssl/internet-explorer) [[website]](https://internet-explorer-ssl.github.io/)
    • [pdf
    • [pdf - shot-video-to-text)
    • Alpha-CLIP - clip/)
    • FGVP
    • EVA
    • OWL-ViT - research/scenic/tree/main/scenic/projects/owl_vit) [[hf docs]](https://huggingface.co/docs/transformers/en/model_doc/owlvit) [[tutorial]](https://huggingface.co/docs/transformers/tasks/zero_shot_object_detection) [[colab]](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/zeroshot_object_detection_with_owlvit.ipynb) [[demo]](https://huggingface.co/spaces/adirik/OWL-ViT) [[demo]](https://huggingface.co/spaces/adirik/image-guided-owlvit) [[demo]](https://huggingface.co/spaces/johko/OWL-ViT) [[demo]](https://huggingface.co/spaces/kellyxiaowei/OWL-ViT) [[demo]](https://huggingface.co/spaces/wendys-llc/OWL-ViT)
    • FDT
    • ReCLIP
    • [pdf - chefer/TargetCLIP)
    • xCLIP
    • [pdf
    • Prompt Align
    • UPL
    • ProDA
    • DenseCLIP
    • CoCoOp
    • Tip-Adapter - Adapter)
    • TPT
    • [pdf
    • ALIP
    • WaffleCLIP
    • [pdf
    • APE
    • R-AMT - AMT) [[website]](https://wuw2019.github.io/R-AMT/)
    • SVL-Adapter
    • KAPT
    • SuS-X - X)
    • REACT - vl.github.io/)
    • CLIPpy
    • CDUL
    • DN - dev/distribution-normalization) [[website]](https://fengyuli-dev.github.io/dn-website/)
    • PLOT
    • GEM
    • SynthCLIP
    • [pdf - Liang/Modality-Gap) [[website]](https://modalitygap.readthedocs.io/en/latest/)
    • [pdf - Adapter)
    • [pdf
    • [pdf
    • [pdf - CLIP)
    • [pdf
    • [pdf
    • OpenCLIP - AI/scaling-laws-openclip) [[clip colab]](https://colab.research.google.com/github/mlfoundations/open_clip/blob/master/docs/Interacting_with_open_clip.ipynb) [[clip benchmark]](https://github.com/LAION-AI/CLIP_benchmark) [[hf models]](https://huggingface.co/models?library=open_clip)
    • CLIPScore
    • LiT - research/vision_transformer) [[website]](https://google-research.github.io/vision_transformer/lit/)
    • SigLIP - research/big_vision) [[colab demo]](https://colab.research.google.com/github/google-research/big_vision/blob/main/big_vision/configs/proj/image_text/SigLIP_demo.ipynb) [[hf notebook]](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/SigLIP/Inference_with_(multilingual)_SigLIP%2C_a_better_CLIP_model.ipynb) [[hf docs]](https://huggingface.co/docs/transformers/model_doc/siglip) [[hf models]](https://huggingface.co/collections/google/siglip-659d5e62f0ae1a57ae0e83ba)
    • ALIGN
    • DeCLIP - GVT/DeCLIP)
    • FILIP
    • SLIP
    • WiSE-FT - ft)
    • FLYP
    • CapDec - H?usp=sharing)
    • DeCap - wei/DeCap)
    • ViECap
    • VT-CLIP
    • CLIP-ViL - vil/CLIP-ViL)
    • RegionCLIP
    • E-CLIP
    • MaskCLIP
    • CLIPSeg - zero-shot.ipynb) [[demo]](https://huggingface.co/spaces/nielsr/CLIPSeg) [[demo]](https://huggingface.co/spaces/Sijuade/CLIPSegmentation) [[demo]](https://huggingface.co/spaces/taesiri/CLIPSeg) [[demo]](https://huggingface.co/spaces/aryswisnu/CLIPSeg2) [[demo]](https://huggingface.co/spaces/aryswisnu/CLIPSeg) [[blog]](https://huggingface.co/blog/clipseg-zero-shot)
    • ClipCap
    • CLIPPO
    • DIME-FM - FM) [[website]](https://cs-people.bu.edu/sunxm/DIME-FM/)
    • BASIC
    • CoOp
    • RPO
    • KgCoOp
    • ECO
    • UPT
    • CTP
    • MVLPT
    • DAPT
    • LFA - fi/LFA)
    • LaFTer
    • TAP
    • CLIP-Adapter - Adapter)
    • CALIP
    • SHIP
    • LoGoPrompt
    • GRAM
    • MaPLe - prompt-learning)
    • PromptSR
    • ProGrad - align)
  • Segmentation + Vision-Language

  • New Large-Scale Datasets

  • Policy Gradients with Image Captioning

    • [pdf - critical.pytorch)
    • [pdf - critical.pytorch)
    • [pdf - min/CLIP-Caption-Reward)
    • [pdf
    • [pdf
    • [pdf - min/CLIP-Caption-Reward)
    • [pdf - critical.pytorch)
    • [pdf - critical.pytorch)
    • [pdf