Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mahendran-narayanan/double-vit
[ICPR 2024] DoubleViT Pushing transformers towards the end because of convolutions
https://github.com/mahendran-narayanan/double-vit
deep-learning deep-neural-networks icpr2024 pytorch tensorflow
Last synced: about 1 month ago
JSON representation
[ICPR 2024] DoubleViT Pushing transformers towards the end because of convolutions
- Host: GitHub
- URL: https://github.com/mahendran-narayanan/double-vit
- Owner: mahendran-narayanan
- License: mit
- Created: 2024-08-25T07:10:21.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-11-30T09:39:54.000Z (about 2 months ago)
- Last Synced: 2024-11-30T10:28:23.511Z (about 2 months ago)
- Topics: deep-learning, deep-neural-networks, icpr2024, pytorch, tensorflow
- Language: Python
- Homepage:
- Size: 4.88 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# DoubleViT: Pushing transformers towards the end because of convolutions
[![cite-bibtex](https://img.shields.io/badge/Cite-BibTeX-1f425f.svg)](#cite)
Official repository of the ICPR 2024 paper "DoubleViT: Pushing transformers towards the end because of convolutions"
Code contains the DoubleViT model with the CIFAR-10 dataset as example.[Mahendran Narayanan](https://scholar.google.de/citations?user=c8subicAAAAJ)
## Abstract
Vision transformers have outperformed convolutional networks and dominate the field in vision tasks. Recent trends indicate a shift towards exploring alternatives to attention mechanisms. We introduce DoubleViT, a model that pushes the attention mechanisms towards the end of the network. The network begins with convolutional layers and concludes with attention mechanisms. The convolutional layers and their depth are determined based on input shapes. In this approach, the shift mechanism learns from the outputs of the convolution layers rather than from the input image patches. This fusion enhances the networkβs ability to capture better feature representations. This proposed model has a decrease in parameters when compared to other ViTs. We conduct extensive experiments on benchmark datasets to validate the model and compare them with established architectures. Experimental results demonstrate a remarkable increase in the classification accuracy of the proposed model.
## Cite
If you have used DoubleViT in your research, please cite our work. π```
@inproceedings{Mahendran2024doublevit,
title = {DoubleViT: Pushing transformers towards the end because of convolutions},
author = {Narayanan, Mahendran},
booktitle = {International Conference on Pattern Recognition (ICPR)},
year = {2024},
}
```