Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Awesome-Multimodality
A Survey on multimodal learning research.
https://github.com/Yutong-Zhou-cv/Awesome-Multimodality
Last synced: 4 days ago
JSON representation
-
<span id="head2"> *2. Topic Order* </span>
- [v1
- [v1
- [v1
- [v1
- [v1
- [v1
- [v1
- [v1
- 💬Transformer
- [v1
- [v1
- 💬Knowledge Enhanced
- [v1
- [v1
- [v1
- 💬Cardiac Image Computing
- 💬Vision and language Pre-training (VLP)
- 💬Video Saliency Detection
- 💬Vision and language Pre-training (VLP)
- 💬Vision and language Pre-training (VLP)
- 💬Vision and language Pre-training (VLP)
- 💬Multi-Modal Knowledge Graph
- 💬Auto Driving
- [v1
- 💬Vision and language
- [Paper
- [Paper - LLMl)]
- [Paper - dataset.github.io/wukong-dataset/download.html)]
- [Paper - WuDao/WuDaoMM/)]
- [Paper
- [Paper
- [Paper
- [Paper
- ![PWC - segmentation-on-ade20k?p=image-as-a-foreign-language-beit-pretraining)
- ![PWC - segmentation-on-ade20k-val?p=image-as-a-foreign-language-beit-pretraining)
- ![PWC - modal-retrieval-on-coco-2014?p=image-as-a-foreign-language-beit-pretraining)
- ![PWC - shot-cross-modal-retrieval-on-flickr30k?p=image-as-a-foreign-language-beit-pretraining)
- ![PWC - modal-retrieval-on-flickr30k?p=image-as-a-foreign-language-beit-pretraining)
- ![PWC - reasoning-on-nlvr2-dev?p=image-as-a-foreign-language-beit-pretraining)
- ![PWC - reasoning-on-nlvr2-test?p=image-as-a-foreign-language-beit-pretraining)
- ![PWC - question-answering-on-vqa-v2-test-dev?p=image-as-a-foreign-language-beit-pretraining)
- ![PWC - question-answering-on-vqa-v2-test-std?p=image-as-a-foreign-language-beit-pretraining)
- ![PWC - segmentation-on-coco?p=image-as-a-foreign-language-beit-pretraining)
- ![PWC - detection-on-coco?p=image-as-a-foreign-language-beit-pretraining)
- [Paper - PLM)]
- **CVPR 2022 Tutorial**
- [Paper - 97/X-VLM)]
- [Paper - learning-multiple-modalities-with.html)]
- [Paper
- [Paper
- [Paper
- [Paper - smile/TCL)]
- [Paper
- [Paper
- [Paper
- [v1
- [v1
- Survey
- [v1
- [v1
- [v1
- [v1
- [v1
- [v1
- [v1
- [v1
- [v1
- [v1
- 💬Multi-Modal Knowledge Graph
- ![PWC - reasoning-on-nlvr2-dev?p=image-as-a-foreign-language-beit-pretraining)
- [v1
- [v1
- [v1
- [v1
- [v1
- [v1
- [v1
- [v1
- 💬Vision and language
- 💬Data Augmentation
- [v1
- 💬Cardiac Image Computing
- [v1
- [v1
- [v1
- [v1
- [v1
- [v1
- [v1
-
<span id="head3"> *3. Chronological Order* </span>
- [Paper - alpha.github.io/)] [[Code](https://github.com/matrix-alpha/Accountable-Textual-Visual-Chat)]
- [Paper
- [Paper - gen.github.io/)] [[Code](https://github.com/microsoft/i-Code/tree/main/i-Code-V3)]
- [Paper - Owl/summary)] [[Code](https://github.com/X-PLUG/mPLUG-Owl)]
- [Paper
- 💬Visual Metaphors
- [Paper - NLP/MM-SHAP)]
- [Paper - Labs/Versatile-Diffusion)] [[Hugging Face](https://huggingface.co/spaces/shi-labs/Versatile-Diffusion)]
- 💬Multimodal Modeling
- 💬Navigation
- 💬Video Chapter Generation
- 💬Multi-modal & Bias
- 💬Audio-Visual Speech Separation
- 💬Multi-modal for Recommendation
- 💬Dialogue State Tracking
- 💬Multi-modal Multi-task - VILAB/MultiMAE)] [[Project](https://multimae.epfl.ch/)]
- 💬Text-Video Retrieval - labs/xpool)] [[Project](https://layer6ai-labs.github.io/xpool/)]
- 💬Pretraining framework
- 💬Food Retrieval
- 💬Video-Text Alignment
- 💬Class-agnostic Object Detection
- 💬Video Recognition
- 💬Video Action Recognition
- 💬Video+Language Pre-training
- 💬Vision-language transformer
- 💬Visual Question Answering (VQA)
- 💬Visual Commonsense - commonsense)]
- 💬Image+Videos+3D Data Recognition
- 💬Video Representation
- 💬Text-guided Image Manipulation
- 💬Facial Editing - to-Edit)] [[Project](https://www.mmlab-ntu.com/project/talkedit/)] [[Dataset Project](https://mmlab.ie.cuhk.edu.hk/projects/CelebA/CelebA_Dialog.html)] [[Dataset(CelebA-Dialog Dataset)](https://drive.google.com/drive/folders/18nejI_hrwNzWyoF6SW8bL27EYnM4STAs)]
- 💬Video Synthesis - research/MMVID)] [[Project](https://snap-research.github.io/MMVID/)]
- 💬Hyper-text Language-image Model
- 💬Visual Synthesis
-
<span id="head4"> *3.Courses* </span>
-
<span id="head5"> *Contact Me* </span>
- Yutong ZHOU - Interaction-Laboratory) ଘ(੭*ˊᵕˋ)੭