An open API service indexing awesome lists of open source software.

https://github.com/junchen14/multi-modal-transformer

The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and self-supervised learning models. Additionally, it also collects many useful tutorials and tools in these related domains.
https://github.com/junchen14/multi-modal-transformer

efficiency-transformer image-transformer language mlp-mixer multi-modal multi-modal-cvpr2021 transformer-readling-list video-language video-transformer vision-transformer

Last synced: about 1 month ago
JSON representation

The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and self-supervised learning models. Additionally, it also collects many useful tutorials and tools in these related domains.

Awesome Lists containing this project

README

        

# Reading list in Transformer

This repo is aimed to collect all the recent popular Transformer paper, codes and learning resources with respect to the domains of **Vision Transformer**, **NLP** and **multi-modal**, etc.

### Topics (paper and code)
- [Image Transformer](image-transformer.md)

- [Video Transformer](video-transformer.md)

- [Video & Language & other modality Transformer](video-language-transformer.md)

- [Image & language & other modlity Trasformer](image-language-transformer.md)

- [Natural Language Processing Transformer](NLP-transformer.md)

- [Efficient Transformer](efficiency-transformer.md)

- [model compression](vision_model_compression.md)

- [Self Supverpervised Learning in Vision](Self-supervised_learning.md)

- [other interested papers in related domains](other_interesting_paper.md)

Review Paper in multi-modal
- [Video-language](paper-review.md)

### Tutorials and workshop
- [Cross-View and Cross-Modal Visual Geo-Localization: IEEE CVPR 2021 Tutorial](https://youtube.com/playlist?list=PLUgbVHjDharjTo9tk3xcPJHEkmi33ap-u)

- [From VQA to VLN: Recent Advances in Vision-and-Language Research: IEEE CVPR 2021 Tutorial](https://youtube.com/playlist?list=PLUgbVHjDhari645g1zmpo-MtOVap1FKxh)

- [Tutorial on MultiModal Machine Learning: IEEE CVPR 2022 Tutorial](https://cmu-multicomp-lab.github.io/mmml-tutorial/cvpr2022/)

### Datasets
- [Multi-modal Datasets](datasets.md)

### Blogs
- [Lil's blogs](https://lilianweng.github.io/lil-log/)
-

### Tools
- [PyTorchVideo](https://pytorchvideo.org/) a deep learning library for video understanding research

- [horovod](https://github.com/horovod/horovod) a tool for multi-gpu parallel processing

- [accelerate](https://huggingface.co/docs/accelerate/) an easy API for mixed precision and any kind of distributed computing

- [hyperparameter search: optuna](https://optuna.org/)

- [AI Conference Deadlines](https://aideadlin.es/)