https://github.com/junchen14/multi-modal-transformer

The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and self-supervised learning models. Additionally, it also collects many useful tutorials and tools in these related domains.
https://github.com/junchen14/multi-modal-transformer

efficiency-transformer image-transformer language mlp-mixer multi-modal multi-modal-cvpr2021 transformer-readling-list video-language video-transformer vision-transformer

Last synced: 6 months ago
JSON representation

Host: GitHub
URL: https://github.com/junchen14/multi-modal-transformer
Owner: junchen14
Created: 2021-04-07T06:19:31.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2022-08-27T14:52:06.000Z (about 3 years ago)
Last Synced: 2023-11-07T20:16:55.875Z (almost 2 years ago)
Topics: efficiency-transformer, image-transformer, language, mlp-mixer, multi-modal, multi-modal-cvpr2021, transformer-readling-list, video-language, video-transformer, vision-transformer
Homepage:
Size: 354 KB
Stars: 194
Watchers: 8
Forks: 28
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Reading list in Transformer

 

This repo is aimed to collect all the recent popular Transformer paper, codes and learning resources with respect to the domains of **Vision Transformer**, **NLP** and **multi-modal**, etc. 

### Topics (paper and code)

- [Image Transformer](image-transformer.md) 

- [Video Transformer](video-transformer.md)

- [Video & Language & other modality Transformer](video-language-transformer.md)

- [Image & language & other modlity Trasformer](image-language-transformer.md)

- [Natural Language Processing Transformer](NLP-transformer.md)

- [Efficient Transformer](efficiency-transformer.md)

- [model compression](vision_model_compression.md)

- [Self Supverpervised Learning in Vision](Self-supervised_learning.md)

- [other interested papers in related domains](other_interesting_paper.md)

Review Paper in multi-modal  

- [Video-language](paper-review.md)

### Tutorials and workshop

- [Cross-View and Cross-Modal Visual Geo-Localization: IEEE CVPR 2021 Tutorial](https://youtube.com/playlist?list=PLUgbVHjDharjTo9tk3xcPJHEkmi33ap-u)

- [From VQA to VLN: Recent Advances in Vision-and-Language Research: IEEE CVPR 2021 Tutorial](https://youtube.com/playlist?list=PLUgbVHjDhari645g1zmpo-MtOVap1FKxh)

- [Tutorial on MultiModal Machine Learning: IEEE CVPR 2022 Tutorial](https://cmu-multicomp-lab.github.io/mmml-tutorial/cvpr2022/)

### Datasets

- [Multi-modal Datasets](datasets.md)

### Blogs

- [Lil's blogs](https://lilianweng.github.io/lil-log/)

- 

### Tools

- [PyTorchVideo](https://pytorchvideo.org/) a deep learning library for video understanding research

- [horovod](https://github.com/horovod/horovod) a tool for multi-gpu parallel processing

- [accelerate](https://huggingface.co/docs/accelerate/) an easy API for mixed precision and any kind of distributed computing

- [hyperparameter search: optuna](https://optuna.org/)

- [AI Conference Deadlines](https://aideadlin.es/)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/junchen14/multi-modal-transformer

Awesome Lists containing this project

README