https://github.com/junchen14/multi-modal-transformer
The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and self-supervised learning models. Additionally, it also collects many useful tutorials and tools in these related domains.
https://github.com/junchen14/multi-modal-transformer
efficiency-transformer image-transformer language mlp-mixer multi-modal multi-modal-cvpr2021 transformer-readling-list video-language video-transformer vision-transformer
Last synced: about 1 month ago
JSON representation
The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and self-supervised learning models. Additionally, it also collects many useful tutorials and tools in these related domains.
- Host: GitHub
- URL: https://github.com/junchen14/multi-modal-transformer
- Owner: junchen14
- Created: 2021-04-07T06:19:31.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2022-08-27T14:52:06.000Z (over 2 years ago)
- Last Synced: 2023-11-07T20:16:55.875Z (over 1 year ago)
- Topics: efficiency-transformer, image-transformer, language, mlp-mixer, multi-modal, multi-modal-cvpr2021, transformer-readling-list, video-language, video-transformer, vision-transformer
- Homepage:
- Size: 354 KB
- Stars: 194
- Watchers: 8
- Forks: 28
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Reading list in Transformer
This repo is aimed to collect all the recent popular Transformer paper, codes and learning resources with respect to the domains of **Vision Transformer**, **NLP** and **multi-modal**, etc.
### Topics (paper and code)
- [Image Transformer](image-transformer.md)- [Video Transformer](video-transformer.md)
- [Video & Language & other modality Transformer](video-language-transformer.md)
- [Image & language & other modlity Trasformer](image-language-transformer.md)
- [Natural Language Processing Transformer](NLP-transformer.md)
- [Efficient Transformer](efficiency-transformer.md)
- [model compression](vision_model_compression.md)
- [Self Supverpervised Learning in Vision](Self-supervised_learning.md)
- [other interested papers in related domains](other_interesting_paper.md)
Review Paper in multi-modal
- [Video-language](paper-review.md)### Tutorials and workshop
- [Cross-View and Cross-Modal Visual Geo-Localization: IEEE CVPR 2021 Tutorial](https://youtube.com/playlist?list=PLUgbVHjDharjTo9tk3xcPJHEkmi33ap-u)- [From VQA to VLN: Recent Advances in Vision-and-Language Research: IEEE CVPR 2021 Tutorial](https://youtube.com/playlist?list=PLUgbVHjDhari645g1zmpo-MtOVap1FKxh)
- [Tutorial on MultiModal Machine Learning: IEEE CVPR 2022 Tutorial](https://cmu-multicomp-lab.github.io/mmml-tutorial/cvpr2022/)
### Datasets
- [Multi-modal Datasets](datasets.md)### Blogs
- [Lil's blogs](https://lilianweng.github.io/lil-log/)
-### Tools
- [PyTorchVideo](https://pytorchvideo.org/) a deep learning library for video understanding research- [horovod](https://github.com/horovod/horovod) a tool for multi-gpu parallel processing
- [accelerate](https://huggingface.co/docs/accelerate/) an easy API for mixed precision and any kind of distributed computing
- [hyperparameter search: optuna](https://optuna.org/)
- [AI Conference Deadlines](https://aideadlin.es/)