{"id":27378734,"url":"https://github.com/junchen14/multi-modal-transformer","last_synced_at":"2026-01-24T17:02:54.607Z","repository":{"id":37886137,"uuid":"355434224","full_name":"junchen14/Multi-Modal-Transformer","owner":"junchen14","description":"The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and self-supervised learning models.  Additionally, it also collects  many useful tutorials and tools in these related domains. ","archived":false,"fork":false,"pushed_at":"2022-08-27T14:52:06.000Z","size":362,"stargazers_count":194,"open_issues_count":0,"forks_count":28,"subscribers_count":8,"default_branch":"main","last_synced_at":"2023-11-07T20:16:55.875Z","etag":null,"topics":["efficiency-transformer","image-transformer","language","mlp-mixer","multi-modal","multi-modal-cvpr2021","transformer-readling-list","video-language","video-transformer","vision-transformer"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/junchen14.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-04-07T06:19:31.000Z","updated_at":"2023-10-14T01:24:15.000Z","dependencies_parsed_at":"2022-07-26T17:45:30.115Z","dependency_job_id":null,"html_url":"https://github.com/junchen14/Multi-Modal-Transformer","commit_stats":null,"previous_names":[],"tags_count":0,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/junchen14%2FMulti-Modal-Transformer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/junchen14%2FMulti-Modal-Transformer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/junchen14%2FMulti-Modal-Transformer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/junchen14%2FMulti-Modal-Transformer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/junchen14","download_url":"https://codeload.github.com/junchen14/Multi-Modal-Transformer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248723223,"owners_count":21151413,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["efficiency-transformer","image-transformer","language","mlp-mixer","multi-modal","multi-modal-cvpr2021","transformer-readling-list","video-language","video-transformer","vision-transformer"],"created_at":"2025-04-13T13:37:01.884Z","updated_at":"2026-01-24T17:02:49.541Z","avatar_url":"https://github.com/junchen14.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Reading list in Transformer\n \n\nThis repo is aimed to collect all the recent popular Transformer paper, codes and learning resources with respect to the domains of **Vision Transformer**, **NLP** and **multi-modal**, etc. \n\n\n\n\n### Topics (paper and code)\n- [Image Transformer](image-transformer.md) \n\n\n- [Video Transformer](video-transformer.md)\n\n\n- [Video \u0026 Language \u0026 other modality Transformer](video-language-transformer.md)\n\n\n- [Image \u0026 language \u0026 other modlity Trasformer](image-language-transformer.md)\n\n\n- [Natural Language Processing Transformer](NLP-transformer.md)\n\n\n- [Efficient Transformer](efficiency-transformer.md)\n\n- [model compression](vision_model_compression.md)\n\n- [Self Supverpervised Learning in Vision](Self-supervised_learning.md)\n\n\u003c!-- - [MLP for Image Classification](MLP-mixer.md) --\u003e\n\n- [other interested papers in related domains](other_interesting_paper.md)\n\n\nReview Paper in multi-modal  \n- [Video-language](paper-review.md)\n\n\n### Tutorials and workshop\n- [Cross-View and Cross-Modal Visual Geo-Localization: IEEE CVPR 2021 Tutorial](https://youtube.com/playlist?list=PLUgbVHjDharjTo9tk3xcPJHEkmi33ap-u)\n\n- [From VQA to VLN: Recent Advances in Vision-and-Language Research: IEEE CVPR 2021 Tutorial](https://youtube.com/playlist?list=PLUgbVHjDhari645g1zmpo-MtOVap1FKxh)\n\n- [Tutorial on MultiModal Machine Learning: IEEE CVPR 2022 Tutorial](https://cmu-multicomp-lab.github.io/mmml-tutorial/cvpr2022/)\n\n\n\n### Datasets\n- [Multi-modal Datasets](datasets.md)\n\n\n### Blogs\n- [Lil's blogs](https://lilianweng.github.io/lil-log/)\n- \n\n### Tools\n- [PyTorchVideo](https://pytorchvideo.org/) a deep learning library for video understanding research\n\n- [horovod](https://github.com/horovod/horovod) a tool for multi-gpu parallel processing\n\n- [accelerate](https://huggingface.co/docs/accelerate/) an easy API for mixed precision and any kind of distributed computing\n\n- [hyperparameter search: optuna](https://optuna.org/)\n\n- [AI Conference Deadlines](https://aideadlin.es/)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjunchen14%2Fmulti-modal-transformer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjunchen14%2Fmulti-modal-transformer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjunchen14%2Fmulti-modal-transformer/lists"}