Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

awesome-vision-language-pretraining

Awesome Vision-Language Pretraining Papers
https://github.com/fawazsammani/awesome-vision-language-pretraining

  • ViLBERT - multi-task) [[code]](https://github.com/jiasenlu/vilbert_beta)
  • Unified-VLP
  • ImageBERT
  • SimVLM
  • ALBEF
  • LXMERT
  • X-LXMERT - lxmert)
  • VisualBERT
  • UNIMO
  • UNIMO-2 - ptm.github.io/)
  • BLIP - VQA-models)
  • BLIP-2 - Tutorials/tree/master/BLIP-2) [[hf docs]](https://huggingface.co/docs/transformers/model_doc/blip-2) [[finetuning colab]](https://colab.research.google.com/drive/16XbIysCzgpAld7Kd9-xz-23VPWmqdWmW?usp=sharing) [[blog]](https://huggingface.co/blog/blip-2)
  • Uni-EDEN
  • VisualGPT - CAIR/VisualGPT)
  • MiniVLM
  • XGPT
  • ViTCAP
  • LEMON
  • IC3 - by-committee)
  • TAP
  • PICa
  • CVLP
  • UniT
  • VL-BERT - BERT)
  • Unicoder-VL
  • UNITER
  • ViLT - vqa)
  • GLIP
  • GLIPv2
  • VLMo
  • METER
  • WenLan
  • InterBERT
  • SemVLP
  • E2E-VLP
  • VinVL
  • UFO
  • Florence
  • VILLA
  • TDEN
  • ERNIE-ViL
  • Vokenization
  • 12-in-1 - multi-task)
  • KVL-BERT
  • Oscar
  • VIVO
  • SOHO
  • Pixel-BERT
  • LightningDOT
  • VirTex
  • Uni-Perceiver - Perceiver)
  • Uni-Perceiver v2 - Perceiver)
  • CoCa - pytorch) [[code]](https://github.com/mlfoundations/open_clip) [[colab]](https://colab.research.google.com/github/mlfoundations/open_clip/blob/master/docs/Interacting_with_open_coca.ipynb)
  • Flamingo - pytorch) [[code]](https://github.com/mlfoundations/open_flamingo) [[code]](https://github.com/dhansmair/flamingo-mini) [[website]](https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model) [[blog]](https://wandb.ai/gladiator/Flamingo%20VLM/reports/DeepMind-Flamingo-A-Visual-Language-Model-for-Few-Shot-Learning--VmlldzoyOTgzMDI2) [[blog]](https://laion.ai/blog/open-flamingo/) [[blog]](https://laion.ai/blog/open-flamingo-v2/)
  • BEiT-3
  • UniCL
  • UVLP
  • OFA - Sys/OFA) [[models and demos]](https://huggingface.co/OFA-Sys)
  • GPV-1 - 1/) [[website]](https://prior.allenai.org/projects/gpv)
  • GPV-2
  • TCL - smile/TCL)
  • L-Verse
  • FLAVA - model.github.io/) [[tutorial]](https://pytorch.org/tutorials/beginner/flava_finetuning_tutorial.html)
  • COTS
  • VL-ADAPTER
  • Unified-IO - io.allenai.org/)
  • ViLTA
  • CapDet
  • PTP - sg/ptp)
  • X-VLM - 97/x-vlm)
  • FewVLM
  • M3AE - geng/m3ae_public)
  • CFM-ViT
  • mPLUG
  • PaLI - scaling-language-image-learning-in.html)
  • GIT - VQA-models)
  • MaskVLM
  • DALL-E - E) [[code]](https://github.com/borisdayma/dalle-mini) [[code]](https://github.com/lucidrains/DALLE-pytorch) [[code]](https://github.com/kuprel/min-dalle) [[code]](https://github.com/robvanvolt/DALLE-models) [[code]](https://github.com/kakaobrain/minDALL-E) [[website]](https://openai.com/blog/dall-e/) [[video]](https://www.youtube.com/watch?v=j4xgkjWlfL4&t=1432s&ab_channel=YannicKilcher) [[video]](https://www.youtube.com/watch?v=jMqLTPcA9CQ&t=1034s&ab_channel=TheAIEpiphany) [[video]](https://www.youtube.com/watch?v=x_8uHX5KngE&ab_channel=TheAIEpiphany) [[blog]](https://wandb.ai/dalle-mini/dalle-mini/reports/DALL-E-mini--Vmlldzo4NjIxODA) [[blog]](https://wandb.ai/dalle-mini/dalle-mini/reports/DALL-E-mini-Generate-images-from-any-text-prompt--VmlldzoyMDE4NDAy) [[blog]](https://wandb.ai/dalle-mini/dalle-mini/reports/Building-efficient-image-input-pipelines--VmlldzoyMjMxOTQw) [[blog]](https://ml.berkeley.edu/blog/posts/vq-vae/) [[blog]](https://ml.berkeley.edu/blog/posts/dalle2/) [[blog]](https://towardsdatascience.com/understanding-how-dall-e-mini-works-114048912b3b)
  • DALL-E-2 - pytorch) [[website]](https://openai.com/dall-e-2/) [[blog]](http://adityaramesh.com/posts/dalle2/dalle2.html) [[blog]](https://www.assemblyai.com/blog/how-dall-e-2-actually-works/) [[blog]](https://medium.com/augmented-startups/how-does-dall-e-2-work-e6d492a2667f)
  • DALL-E 3 - e-3)
  • GigaGAN - pytorch) [[code]](https://github.com/jianzhnie/GigaGAN) [[website]](https://mingukkang.github.io/GigaGAN/)
  • Parti - research/parti) [[code]](https://github.com/lucidrains/parti-pytorch) [[video]](https://www.youtube.com/watch?v=qS-iYnp00uc&ab_channel=YannicKilcher) [[blog]](https://parti.research.google/)
  • Paella
  • Make-A-Scene
  • FIBER
  • VL-BEiT - beit)
  • MetaLM
  • VL-T5 - min/VL-T5)
  • UNICORN
  • MI2P
  • MDETR
  • VLMixer
  • ViCHA
  • Img2LLM - vqa) [[colab]](https://colab.research.google.com/github/salesforce/LAVIS/blob/main/projects/img2llm-vqa/img2llm_vqa.ipynb)
  • PNP-VQA - vqa) [[colab]](https://colab.research.google.com/github/salesforce/LAVIS/blob/main/projects/pnp-vqa/pnp_vqa.ipynb)
  • StoryDALL-E
  • VLMAE
  • MLIM
  • MOFI
  • GILL
  • Language Pivoting
  • Graph-Align
  • PL-UIC
  • SCL
  • TaskRes
  • EPIC
  • HAAV
  • FLM
  • DiHT
  • VL-Match
  • Prismer
  • PaLM-E - e.github.io/) [[blog]](https://ai.googleblog.com/2023/03/palm-e-embodied-multimodal-language.html)
  • X-Decoder - Decoder/tree/main) [[xgpt code]](https://github.com/microsoft/X-Decoder/tree/xgpt) [[website]](https://x-decoder-vl.github.io/) [[demo]](https://huggingface.co/spaces/xdecoder/Demo) [[demo]](https://huggingface.co/spaces/xdecoder/Instruct-X-Decoder)
  • PerVL
  • TextManiA - ami/TextManiA) [[website]](https://moon-yb.github.io/TextManiA.github.io/) [[GAN Inversion]](https://arxiv.org/pdf/2004.00049.pdf)
  • Cola
  • K-LITE
  • SINC
  • Visual ChatGPT - chatgpt)
  • CM3Leon
  • KOSMOS-1
  • KOSMOS-2 - 2) [[code]](https://huggingface.co/docs/transformers/model_doc/kosmos-2) [[hf notebook]](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/KOSMOS-2/Inference_with_KOSMOS_2_for_multimodal_grounding.ipynb) [[demo]](https://huggingface.co/spaces/ydshieh/Kosmos-2)
  • MultiModal-GPT - mmlab/Multimodal-GPT)
  • LLaVA - liu/LLaVA) [[website]](https://llava-vl.github.io/) [[hf notebook]](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LLaVa/Inference_with_LLaVa_for_multimodal_generation.ipynb) [[hf docs]](https://huggingface.co/docs/transformers/model_doc/llava)
  • ViP-LLaVA - cai/vip-llava) [[demo]](https://pages.cs.wisc.edu/~mucai/vip-llava.html) [[website]](https://vip-llava.github.io/) [[hf notebook]](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/ViP-LLaVa/Inference_with_ViP_LLaVa_for_fine_grained_VQA.ipynb) [[hf docs]](https://huggingface.co/docs/transformers/model_doc/vipllava)
  • VILA - Large-Model/VILA) [[hf page]](https://huggingface.co/Efficient-Large-Model)
  • NExT-Chat - ChatV/NExT-Chat) [[demo]](https://516398b33beb3e8b9f.gradio.live/) [[website]](https://next-chatv.github.io/)
  • MiniGPT-4 - CAIR/MiniGPT-4) [[website]](https://minigpt-4.github.io/) [[demo]](https://huggingface.co/spaces/Vision-CAIR/minigpt4)
  • MiniGPT-v2 - CAIR/MiniGPT-4) [[website]](https://minigpt-v2.github.io/) [[demo]](https://876a8d3e814b8c3a8b.gradio.live/) [[demo]](https://huggingface.co/spaces/Vision-CAIR/MiniGPT-v2)
  • LLaMA-Adapter - Adapter)
  • LLaMA-Adapter V2 - Adapter) [[demo]](http://llama-adapter.opengvlab.com/)
  • LaVIN
  • InstructBLIP - Tutorials/blob/master/InstructBLIP/Inference_with_InstructBLIP.ipynb)
  • Otter/MIMIC-IT - ntu.github.io/)
  • CogVLM
  • ImageBind
  • TextBind
  • MetaVL
  • M³IT - it.github.io/)
  • Instruction-ViT
  • MultiInstruct - NLP/MultiInstruct)
  • VisIT-Bench - Bench/) [[website]](https://visit-bench.github.io/) [[blog]](https://laion.ai/blog/visit_bench/) [[dataset]](https://huggingface.co/datasets/mlfoundations/VisIT-Bench) [[leaderboard]](https://huggingface.co/spaces/mlfoundations/VisIT-Bench-Leaderboard)
  • GPT4RoI
  • PandaGPT - gpt.github.io/)
  • ChatBridge - chatbridge.github.io/)
  • ImageBind
  • Video-LLaMA - NLP-SG/Video-LLaMA) [[demo]](https://huggingface.co/spaces/DAMO-NLP-SG/Video-LLaMA)
  • VideoChat - Anything)
  • InternGPT
  • mPLUG-Owl - PLUG/mPLUG-Owl) [[demo]](https://huggingface.co/spaces/MAGAer13/mPLUG-Owl)
  • VisionLLM - news/reports/Introducing-VisionLLM-A-New-Method-for-Multi-Modal-LLM-s--Vmlldzo0NTMzNzIz)
  • X-LLM - LLM) [[website]](https://x-llm.github.io/)
  • OBELICS - 80b) [[instruct model 9b]](https://huggingface.co/HuggingFaceM4/idefics-9b-instruct) [[instruct model 80b]](https://huggingface.co/HuggingFaceM4/idefics-80b-instruct) [[demo]](https://huggingface.co/spaces/HuggingFaceM4/idefics_playground) [[dataset]](https://huggingface.co/datasets/HuggingFaceM4/OBELICS)
  • EvALign-ICL - ICL) [[website]](https://evalign-icl.github.io/)
  • Plug and Pray
  • VL-PET - PET)
  • ICIS
  • NExT-GPT - GPT/NExT-GPT) [[website]](https://next-gpt.github.io/) [[demo]](https://4271670c463565f1a4.gradio.live/)
  • UnIVAL
  • BUS
  • VPGTrans
  • PromptCap - Hu/PromptCap) [[website]](https://yushi-hu.github.io/promptcap_demo/) [[hf checkpoint]](https://huggingface.co/tifa-benchmark/promptcap-coco-vqa)
  • P-Former
  • TL;DR
  • PMA-Net - Net)
  • Encyclopedic VQA - research/google-research/tree/master/encyclopedic_vqa)
  • CMOTA
  • CPT
  • TeS
  • MP - Probing)
  • [pdf
  • [pdf
  • [pdf
  • [pdf
  • [pdf
  • [pdf
  • [pdf
  • [pdf
  • [pdf
  • [pdf
  • [pdf
  • [pdf
  • [pdf
  • [pdf
  • [pdf
  • [pdf
  • [pdf
  • [pdf
  • [pdf
  • [pdf
  • CLIP - CLIP) [[code]](https://github.com/moein-shariatnia/OpenAI-CLIP) [[code]](https://github.com/lucidrains/x-clip) [[code]](https://github.com/huggingface/transformers/tree/main/examples/pytorch/contrastive-image-text) [[hf docs]](https://huggingface.co/docs/transformers/model_doc/clip) [[website]](https://openai.com/blog/clip/) [[video]](https://www.youtube.com/watch?v=T9XSU0pKX2E&t=1455s&ab_channel=YannicKilcher) [[video]](https://www.youtube.com/watch?v=fQyHEXZB-nM&ab_channel=AleksaGordi%C4%87-TheAIEpiphany) [[video code]](https://www.youtube.com/watch?v=jwZQD0Cqz4o&t=4610s&ab_channel=TheAIEpiphany) [[CLIP_benchmark]](https://github.com/LAION-AI/CLIP_benchmark) [[clip-retrieval]](https://github.com/rom1504/clip-retrieval) [[clip-retrieval blog]](https://rom1504.medium.com/semantic-search-with-embeddings-index-anything-8fb18556443c)
  • OpenCLIP - AI/scaling-laws-openclip) [[clip colab]](https://colab.research.google.com/github/mlfoundations/open_clip/blob/master/docs/Interacting_with_open_clip.ipynb) [[clip benchmark]](https://github.com/LAION-AI/CLIP_benchmark) [[hf models]](https://huggingface.co/models?library=open_clip)
  • CLIPScore
  • LiT - research/vision_transformer) [[website]](https://google-research.github.io/vision_transformer/lit/)
  • SigLIP - research/big_vision) [[colab demo]](https://colab.research.google.com/github/google-research/big_vision/blob/main/big_vision/configs/proj/image_text/SigLIP_demo.ipynb) [[hf notebook]](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/SigLIP/Inference_with_(multilingual)_SigLIP%2C_a_better_CLIP_model.ipynb) [[hf docs]](https://huggingface.co/docs/transformers/model_doc/siglip) [[hf models]](https://huggingface.co/collections/google/siglip-659d5e62f0ae1a57ae0e83ba)
  • ALIGN
  • DeCLIP - GVT/DeCLIP)
  • FLIP
  • Counting-aware CLIP - clip-to-count.github.io/)
  • ALIP
  • FILIP
  • SLIP
  • WiSE-FT - ft)
  • FLYP
  • MAGIC
  • ZeroCap - shot-image-to-text)
  • CapDec - H?usp=sharing)
  • DeCap - wei/DeCap)
  • ViECap
  • CLOSE
  • xCLIP
  • EVA - CLIP]](https://arxiv.org/pdf/2303.15389.pdf) [[EVA-02]](https://arxiv.org/pdf/2303.11331.pdf) [[code]](https://github.com/baaivision/EVA)
  • VT-CLIP
  • CLIP-ViL - vil/CLIP-ViL)
  • RegionCLIP
  • DenseCLIP
  • E-CLIP
  • MaskCLIP
  • CLIPSeg - zero-shot.ipynb) [[demo]](https://huggingface.co/spaces/nielsr/CLIPSeg) [[demo]](https://huggingface.co/spaces/Sijuade/CLIPSegmentation) [[demo]](https://huggingface.co/spaces/taesiri/CLIPSeg) [[demo]](https://huggingface.co/spaces/aryswisnu/CLIPSeg2) [[demo]](https://huggingface.co/spaces/aryswisnu/CLIPSeg) [[blog]](https://huggingface.co/blog/clipseg-zero-shot)
  • OWL-ViT - research/scenic/tree/main/scenic/projects/owl_vit) [[code]](https://huggingface.co/docs/transformers/model_doc/owlvit) [[code]](https://huggingface.co/docs/transformers/tasks/zero_shot_object_detection) [[colab]](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/zeroshot_object_detection_with_owlvit.ipynb) [[demo]](https://huggingface.co/spaces/adirik/OWL-ViT) [[demo]](https://huggingface.co/spaces/adirik/image-guided-owlvit) [[demo]](https://huggingface.co/spaces/johko/OWL-ViT) [[demo]](https://huggingface.co/spaces/kellyxiaowei/OWL-ViT) [[demo]](https://huggingface.co/spaces/wendys-llc/OWL-ViT)
  • ClipCap
  • VQGAN-CLIP - CLIP) [[code]](https://github.com/EleutherAI/vqgan-clip) [[code]](https://github.com/justinjohn0306/VQGAN-CLIP) [[code]](https://www.kaggle.com/code/basu369victor/playing-with-vqgan-clip/notebook) [[colab]](https://colab.research.google.com/github/dribnet/clipit/blob/master/demos/Moar_Settings.ipynb) [[colab]](https://colab.research.google.com/drive/1L8oL-vLJXVcRzCFbPwOoMkPKJ8-aYdPN) [[colab]](https://colab.research.google.com/github/justinjohn0306/VQGAN-CLIP/blob/main/VQGAN%2BCLIP(Updated).ipynb)
  • AltCLIP - Open/FlagAI) [[code]](https://huggingface.co/docs/transformers/model_doc/altclip)
  • CLIPPO
  • FDT
  • DIME-FM - FM) [[website]](https://cs-people.bu.edu/sunxm/DIME-FM/)
  • ViLLA
  • BASIC
  • CoOp
  • CoCoOp
  • RPO
  • KgCoOp
  • ECO
  • UPT
  • UPL
  • ProDA
  • CTP
  • MVLPT
  • DAPT
  • LFA - fi/LFA)
  • LaFTer
  • TAP
  • CLIP-Adapter - Adapter)
  • Tip-Adapter - Adapter)
  • CALIP
  • SHIP
  • LoGoPrompt
  • GRAM
  • MaPLe - prompt-learning)
  • PromptSR
  • ProGrad - align)
  • Prompt Align
  • APE
  • CuPL
  • WaffleCLIP
  • R-AMT - AMT) [[website]](https://wuw2019.github.io/R-AMT/)
  • SVL-Adapter
  • KAPT
  • SuS-X - X)
  • Internet Explorer - explorer-ssl/internet-explorer) [[website]](https://internet-explorer-ssl.github.io/)
  • REACT - vl.github.io/)
  • SEARLE
  • CLIPpy
  • CDUL
  • DN - dev/distribution-normalization) [[website]](https://fengyuli-dev.github.io/dn-website/)
  • TPT
  • ReCLIP
  • PLOT
  • GEM
  • SynthCLIP
  • [pdf - menon/classify_by_description_release) [[website]](https://cv.cs.columbia.edu/sachit/classviadescr/)
  • [pdf - Liang/Modality-Gap) [[website]](https://modalitygap.readthedocs.io/en/latest/)
  • [pdf - Adapter)
  • [pdf - tuning_ICCV_2023_supplemental.pdf) [[code]](https://github.com/SwinTransformer/Feature-Distillation)
  • [pdf
  • [pdf
  • [pdf
  • [pdf
  • [pdf
  • [pdf - CLIP)
  • [pdf - shot-video-to-text)
  • [pdf
  • [pdf
  • [pdf
  • [pdf - chefer/TargetCLIP)
  • [pdf
  • [pdf
  • [code - guided-diffusion) [[code]](https://github.com/nerdyrodent/CLIP-Guided-Diffusion) [[code]](https://github.com/crowsonkb/v-diffusion-pytorch)
  • [video
  • CLIP Varaints
  • awesome-clip
  • [pdf - critical.pytorch)
  • [pdf - critical.pytorch)
  • [pdf - min/CLIP-Caption-Reward)
  • [pdf
  • Semantic-SAM
  • SEEM
  • Grounding DINO
  • [pdf
  • [pdf - o?usp=sharing) [[video]](https://www.youtube.com/watch?v=KEv-F5UkhxU&ab_channel=AICoffeeBreakwithLetitia) [[blog]](https://huggingface.co/blog/peft) [[blog]](https://www.ml6.eu/blogpost/low-rank-adaptation-a-technical-deep-dive) [[blog]](https://medium.com/@abdullahsamilguser/lora-low-rank-adaptation-of-large-language-models-7af929391fee) [[blog]](https://towardsdatascience.com/understanding-lora-low-rank-adaptation-for-finetuning-large-models-936bce1a07c6) [[hf docs]](https://huggingface.co/docs/diffusers/training/lora) [[library]](https://huggingface.co/docs/peft/index) [[library tutorial]](https://huggingface.co/learn/cookbook/prompt_tuning_peft)
  • [pdf - playground-tgi) [[blog]](https://huggingface.co/blog/4bit-transformers-bitsandbytes)
  • [pdf
  • [pdf - python]](https://github.com/abetlen/llama-cpp-python) [[blog]](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/) [[video]](https://www.youtube.com/watch?v=E5OnoYF2oAk&t=1915s)
  • [pdf - llama) [[demo]](https://labs.perplexity.ai/) [[demo]](https://huggingface.co/chat) [[blog]](https://ai.meta.com/llama/) [[blog]](https://ai.meta.com/resources/models-and-libraries/llama/) [[blog]](https://huggingface.co/blog/llama2) [[blog]](https://www.philschmid.de/llama-2) [[blog]](https://medium.com/towards-generative-ai/understanding-llama-2-architecture-its-ginormous-impact-on-genai-e278cb81bd5c) [[llama2.c]](https://github.com/karpathy/llama2.c) [[finetune script]](https://gist.github.com/younesbelkada/9f7f75c94bdc1981c8ca5cc937d4a4da) [[finetune script]](https://www.philschmid.de/sagemaker-llama2-qlora) [[finetune script]](https://www.philschmid.de/instruction-tune-llama-2) [[tutorials]](https://github.com/amitsangani/Llama-2) [[yarn]](https://github.com/jquesnelle/yarn)
  • [pdf
  • [pdf - 120b) [[official demo]](https://www.galactica.org/) [[demo]](https://huggingface.co/spaces/lewtun/galactica-demo)
  • [pdf
  • [blog - following/) [[rlhf hf]](https://huggingface.co/blog/rlhf) [[rlhf wandb]](https://wandb.ai/ayush-thakur/RLHF/reports/Understanding-Reinforcement-Learning-from-Human-Feedback-RLHF-Part-1--VmlldzoyODk5MTIx) [[code]](https://github.com/lucidrains/PaLM-rlhf-pytorch)
  • [blog - 4) [[technical report]](https://arxiv.org/pdf/2303.08774.pdf)
  • [pdf
  • [pdf - instruct)
  • [website - lab/stanford_alpaca) [[model]](https://huggingface.co/chavinlo/alpaca-13b) [[demo]](https://alpaca-ai.ngrok.io/) [[gpt4-x-alpaca]](https://huggingface.co/chavinlo/gpt4-x-alpaca)
  • [website - sys/FastChat) [[demo]](https://chat.lmsys.org/)
  • [code
  • [models
  • [hf collections
  • [blog - release-65d5efbccdbb8c4202ec078b) [[bugs]](https://unsloth.ai/blog/gemma-bugs)
  • [github
  • [pdf
  • [pdf - teacher)
  • [pdf
  • [pdf
  • [pdf - transformers)
  • [pdf
  • [github
  • [pdf - soups)
  • [pdf - priming)
  • [pdf
  • [pdf
  • [article
  • [pdf
  • Falcon LLM
  • x-transformers
  • mlabonne - fine-tuning-LLMs) [[train and deploy]](https://www.youtube.com/watch?v=Ma4clS-IdhA&t=1709s) [[supervised FT]](https://www.youtube.com/watch?v=NXevvEF3QVI&t=1s) [[how LLM Chatbots work]](https://www.youtube.com/watch?v=C6ZszXYPDDw&t=26s) [[finetuning tutorial]](https://www.philschmid.de/fine-tune-llms-in-2024-with-trl) [[pytorch finetuning tutorial]](https://pytorch.org/blog/finetune-llms/?utm_content=278057355&utm_medium=social&utm_source=linkedin&hss_channel=lcp-78618366) [[finetuning tutorial]](https://huggingface.co/learn/cookbook/fine_tuning_code_llm_on_single_gpu)
  • [pdf - research/prompt-tuning) [[code]](https://huggingface.co/docs/peft/task_guides/clm-prompt-tuning) [[blog]](https://ai.googleblog.com/2022/02/guiding-frozen-language-models-with.html?m=1) [[blog]](https://heidloff.net/article/introduction-to-prompt-tuning/)
  • [pdf
  • [pdf
  • [pdf - tuning)
  • [pdf - nlp/LM-BFF)
  • [pdf
  • [pdf - models-perform-reasoning-via.html?m=1)
  • [pdf - Edgerunners/Plan-and-Solve-Prompting)
  • [pdf - and-Respond) [[website]](https://uclaml.github.io/Rephrase-and-Respond/)
  • [pdf
  • [pdf
  • [pdf
  • [pdf - inversion.github.io/) [[hf docs]](https://huggingface.co/docs/diffusers/training/text_inversion) [[blog]](https://medium.com/@onkarmishra/how-textual-inversion-works-and-its-applications-5e3fda4aa0bc)
  • [pdf - Stable-Diffusion) [[code]](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth) [[website]](https://dreambooth.github.io/) [[hf docs]](https://huggingface.co/docs/diffusers/training/dreambooth)
  • [pdf
  • [pdf - AI/visual_prompt_retrieval)
  • [pdf - Group/ILM-VP)
  • [pdf
  • [pdf - icl)
  • VisualCOMET
  • LAION
  • Conceptual 12M - research-datasets/conceptual-12m)
  • Winoground
  • [github
  • [github
  • [github
  • [github
  • [github
  • Transformers-VQA
  • MMT-Retrieval
  • Florence-VL
  • unilm
  • awesome-Vision-and-Language-Pre-training
  • awesome-vision-language-pretraining-papers
  • vqa
  • image captioning
  • image captioning
  • scene graphs
  • [pdf
  • [pdf
  • [pdf
  • [pdf
  • [pdf
  • [blog
  • [pdf
  • [pdf
  • Comparing image captioning models
  • Comparing visual question answering (VQA) models
  • Generalized Visual Language Models
  • Prompting in Vision CVPR23 Tutorial
  • CVPR23 Tutorial
  • CVPR22 Tutorial
  • CVPR21 Tutorial
  • CVPR20 Tutorial