Projects in Awesome Lists tagged with vision-transformer

https://github.com/open-mmlab/mmdetection

OpenMMLab Detection Toolbox and Benchmark

cascade-rcnn convnext detr fast-rcnn faster-rcnn glip grounding-dino instance-segmentation mask-rcnn object-detection panoptic-segmentation pytorch retinanet rtmdet semisupervised-learning ssd swin-transformer transformer vision-transformer yolo

Last synced: 16 Dec 2024

https://github.com/lukas-blecher/latex-ocr

pix2tex: Using a ViT to convert images of equations into LaTeX code.

dataset deep-learning im2latex im2markup im2text image-processing image2text latex latex-ocr machine-learning math-ocr ocr python pytorch transformer vision-transformer vit

Last synced: 16 Dec 2024

https://github.com/lukas-blecher/LaTeX-OCR

pix2tex: Using a ViT to convert images of equations into LaTeX code.

dataset deep-learning im2latex im2markup im2text image-processing image2text latex latex-ocr machine-learning math-ocr ocr python pytorch transformer vision-transformer vit

Last synced: 30 Oct 2024

https://github.com/nielsrogge/transformers-tutorials

This repository contains demos I made with the Transformers library by HuggingFace.

bert gpt-2 layoutlm pytorch transformers vision-transformer

Last synced: 16 Dec 2024

https://github.com/NielsRogge/Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.

bert gpt-2 layoutlm pytorch transformers vision-transformer

Last synced: 30 Oct 2024

https://github.com/adithya-s-k/omniparse

Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks

ingestion-api ocr omniparser parse-server parser-library vision-transformer web-crawler whisper-api

Last synced: 17 Dec 2024

https://github.com/jingyunliang/swinir

SwinIR: Image Restoration Using Swin Transformer (official repository)

compression-artifact-reduction deblocking decompression denoising image-deblocking image-denoising image-restoration image-sr image-super-resolution lightweight-image-super-resolution low-level-vision real-world-image-super-resolution restoration super-resolution transformer vision-transformer

Last synced: 19 Dec 2024

https://github.com/JingyunLiang/SwinIR

SwinIR: Image Restoration Using Swin Transformer (official repository)

compression-artifact-reduction deblocking decompression denoising image-deblocking image-denoising image-restoration image-sr image-super-resolution lightweight-image-super-resolution low-level-vision real-world-image-super-resolution restoration super-resolution transformer vision-transformer

Last synced: 13 Nov 2024

https://github.com/foundationvision/var

[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

auto-regressive-model autoregressive-models diffusion-models generative-ai generative-model gpt gpt-2 image-generation large-language-models neurips transformers vision-transformer

Last synced: 17 Dec 2024

https://github.com/FoundationVision/VAR

[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

auto-regressive-model autoregressive-models diffusion-models generative-ai generative-model gpt gpt-2 image-generation large-language-models neurips transformers vision-transformer

Last synced: 04 Nov 2024

https://github.com/huawei-noah/efficient-ai-backbones

Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.

convolutional-neural-networks efficient-inference ghostnet imagenet model-compression pretrained-models pytorch tensorflow transformer vision-transformer

Last synced: 17 Dec 2024

https://github.com/huawei-noah/Efficient-AI-Backbones

Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.

convolutional-neural-networks efficient-inference ghostnet imagenet model-compression pretrained-models pytorch tensorflow transformer vision-transformer

Last synced: 28 Oct 2024

https://github.com/open-mmlab/mmpretrain

OpenMMLab Pre-training Toolbox and Benchmark

beit clip constrastive-learning convnext deep-learning image-classification mae masked-image-modeling mobilenet moco multimodal pretrained-models pytorch resnet self-supervised-learning swin-transformer vision-transformer

Last synced: 21 Dec 2024

https://github.com/google-research/scenic

Scenic: A Jax Library for Computer Vision Research and Beyond

attention computer-vision deep-learning jax research transformers vision-transformer

Last synced: 17 Dec 2024

https://github.com/towhee-io/towhee

Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

computer-vision convolutional-networks embedding-vectors embeddings feature-extraction feature-vector image-processing image-retrieval llm machine-learning milvus pipeline towhee transformer unstructured-data video-processing vision-transformer vit

Last synced: 16 Dec 2024

https://github.com/InternLM/InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

chatgpt foundation gpt gpt-4 instruction-tuning language-model large-language-model large-vision-language-model llm mllm multi-modality multimodal supervised-finetuning vision-language-model vision-transformer visual-language-learning

Last synced: 14 Nov 2024

https://github.com/internlm/internlm-xcomposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

chatgpt foundation gpt gpt-4 instruction-tuning language-model large-language-model large-vision-language-model llm mllm multi-modality multimodal supervised-finetuning vision-language-model vision-transformer visual-language-learning

Last synced: 19 Dec 2024

https://github.com/mit-han-lab/efficientvit

Efficient vision foundation models for high-resolution generation and perception.

deep-compression-autoencoder efficient-diffusion-model efficientvit high-resolution imagenet segment-anything segmentation vision-transformer

Last synced: 17 Dec 2024

https://github.com/baaivision/eva

EVA Series: Visual Representation Fantasies from BAAI

foundation-models representation-learning vision-transformer

Last synced: 19 Dec 2024

https://github.com/baaivision/EVA

EVA Series: Visual Representation Fantasies from BAAI

foundation-models representation-learning vision-transformer

Last synced: 28 Oct 2024

https://github.com/hila-chefer/transformer-explainability

[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.

attention-matrix attention-visualization bert bert-model cvpr2021 deep-learning explainability perturbation transformer-interpretability vision-transformer visualize-classifications vit

Last synced: 21 Dec 2024

https://github.com/alibaba/easycv

An all-in-one toolkit for computer vision

classification computer-vision object-detection pytorch self-supervised-learning transformers vision-transformer

Last synced: 17 Dec 2024

https://github.com/hila-chefer/Transformer-Explainability

[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.

attention-matrix attention-visualization bert bert-model cvpr2021 deep-learning explainability perturbation transformer-interpretability vision-transformer visualize-classifications vit

Last synced: 30 Oct 2024

https://github.com/alibaba/EasyCV

An all-in-one toolkit for computer vision

classification computer-vision object-detection pytorch self-supervised-learning transformers vision-transformer

Last synced: 26 Oct 2024

https://github.com/microsoft/cream

This is a collection of our NAS and Vision Transformer work.

automl efficiency knowledge-distillation nas rpe vision-transformer vit-compression

Last synced: 19 Dec 2024

https://github.com/microsoft/Cream

This is a collection of our NAS and Vision Transformer work.

automl efficiency knowledge-distillation nas rpe vision-transformer vit-compression

Last synced: 05 Nov 2024

https://github.com/vitae-transformer/vitpose

The official repo for [NeurIPS'22] "ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation" and [TPAMI'23] "ViTPose++: Vision Transformer for Generic Body Pose Estimation"

deep-learning distillation mae pose-estimation pytorch self-supervised-learning vision-transformer

Last synced: 19 Dec 2024

https://github.com/jingyunliang/vrt

VRT: A Video Restoration Transformer (official repository)

deblurring denoising low-level-vision restoration sr super-resolution transformer video video-deblurring video-denoising video-restoration video-sr video-super-resolution vision-transformer

Last synced: 15 Dec 2024

https://github.com/JingyunLiang/VRT

VRT: A Video Restoration Transformer (official repository)

deblurring denoising low-level-vision restoration sr super-resolution transformer video video-deblurring video-denoising video-restoration video-sr video-super-resolution vision-transformer

Last synced: 06 Nov 2024

https://github.com/OpenGVLab/InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

action-recognition benchmark contrastive-learning foundation-models instruction-tuning masked-autoencoder multimodal open-set-recognition self-supervised spatio-temporal-action-localization temporal-action-localization video-clip video-data video-dataset video-question-answering video-retrieval video-understanding vision-transformer zero-shot-classification zero-shot-retrieval

Last synced: 28 Oct 2024

https://github.com/czczup/vit-adapter

[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions

adapter object-detection semantic-segmentation vision-transformer

Last synced: 15 Dec 2024

https://github.com/MCG-NJU/VideoMAE

[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

action-recognition mae masked-autoencoder neurips-2022 pytorch self-supervised-learning transformer video-analysis video-representation-learning video-transformer video-understanding vision-transformer

Last synced: 27 Oct 2024

https://github.com/czczup/ViT-Adapter

[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions

adapter object-detection semantic-segmentation vision-transformer

Last synced: 04 Nov 2024

https://github.com/emcf/thepipe

Extract clean data from anywhere, powered by vision-language models ⚡

gpt-4 gpt-4o large-language-models multimodal pdf scrapers vision-transformer web

Last synced: 19 Dec 2024

https://github.com/yitu-opensource/T2T-ViT

ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

t2t-transformer vision-transformer vit

Last synced: 13 Nov 2024

https://github.com/nvlabs/voxformer

Official PyTorch implementation of VoxFormer [CVPR 2023 Highlight]

2d-to-3d 3d-perception 3d-scene-understanding artificial-intelligence autonomous-driving autonomous-vehicles computer-vision deep-learning machine-learning occupancy-grid-map semantic-scene-completion semantickitti vision-transformer voxel-proceessing

Last synced: 16 Dec 2024

https://github.com/NVlabs/VoxFormer

Official PyTorch implementation of VoxFormer [CVPR 2023 Highlight]

2d-to-3d 3d-perception 3d-scene-understanding artificial-intelligence autonomous-driving autonomous-vehicles computer-vision deep-learning machine-learning occupancy-grid-map semantic-scene-completion semantickitti vision-transformer voxel-proceessing

Last synced: 28 Oct 2024

https://github.com/OFA-Sys/ONE-PEACE

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

audio-language contrastive-loss foundation-models multimodal representation-learning vision-and-language vision-language vision-transformer

Last synced: 29 Nov 2024

https://github.com/jacobgil/vit-explain

Explainability for Vision Transformers

deep-learning explainable-ai pytorch transformer vision-transformer

Last synced: 20 Dec 2024

https://github.com/hustvl/yolos

[NeurIPS 2021] You Only Look at One Sequence

computer-vision object-detection transformer vision-transformer

Last synced: 18 Dec 2024

https://github.com/hustvl/YOLOS

[NeurIPS 2021] You Only Look at One Sequence

computer-vision object-detection transformer vision-transformer

Last synced: 09 Nov 2024

https://github.com/xxxnell/how-do-vits-work

(ICLR 2022 Spotlight) Official PyTorch implementation of "How Do Vision Transformers Work?"

loss-landscape pytorch self-attention transformer vision-transformer

Last synced: 15 Nov 2024

https://github.com/NVlabs/FasterViT

[ICLR 2024] Official PyTorch implementation of FasterViT: Fast Vision Transformers with Hierarchical Attention

ade20k backbone coco deep-learning foundation-models image-classification image-net object-detection pre-trained-model self-attention semantic-segmentation vision-transformer visual-recognition

Last synced: 28 Oct 2024

https://github.com/sunzey/alphaclip

[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

deep-learning machine-learning vision-and-language vision-language vision-language-model vision-transformer

Last synced: 21 Dec 2024

https://github.com/Alibaba-MIIL/ImageNet21K

Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(NeurIPS, 2021) paper

downstream imagenet21k mixer multi-label-classification pretraining semantic-softmax single-label vision-transformer

Last synced: 26 Oct 2024

https://github.com/4DVLab/Vision-Centric-BEV-Perception

Vision-Centric BEV Perception: A Survey

bev-perception bird-eye-view deep-learning transformer vision-transformer

Last synced: 28 Oct 2024

https://github.com/baudm/parseq

Scene Text Recognition with Permuted Autoregressive Sequence Models (ECCV 2022)

computer-vision eccv eccv2022 ocr optical-character-recognition scene-text-recognition text-recognition vision-transformer

Last synced: 20 Dec 2024

https://github.com/blaizzy/mlx-vlm

MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.

apple-silicon florence2 idefics llava llm local-ai mlx molmo paligemma pixtral vision-framework vision-language-model vision-transformer

Last synced: 19 Dec 2024

https://github.com/mv-lab/swin2sr

[ECCV] Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration. Advances in Image Manipulation (AIM) workshop ECCV 2022. Try it out! over 3.3M runs https://replicate.com/mv-lab/swin2sr

compression compression-artifact-reduction computer-vision deblocking deep-learning denoising eccv2022 image-denoising image-processing image-restoration image-sr image-super-resolution jpeg low-level-vision ntire super-resolution swin2sr swinir transformer vision-transformer

Last synced: 06 Nov 2024

https://github.com/vitae-transformer/vitdet

Unofficial implementation for [ECCV'22] "Exploring Plain Vision Transformer Backbones for Object Detection"

deep-learning object-detection pytorch vision-transformer

Last synced: 15 Dec 2024

https://github.com/ViTAE-Transformer/ViTDet

Unofficial implementation for [ECCV'22] "Exploring Plain Vision Transformer Backbones for Object Detection"

deep-learning object-detection pytorch vision-transformer

Last synced: 15 Nov 2024

https://github.com/jdai-cv/cotnet

This is an official implementation for "Contextual Transformer Networks for Visual Recognition".

contextual-transformer cotnet image-classification imagenet instance-segmentation mask-rcnn mscoco object-detection semantic-segmentation vision-transformer

Last synced: 15 Dec 2024

https://github.com/mahmoodlab/hipt

Hierarchical Image Pyramid Transformer - CVPR 2022 (Oral)

computational-pathology cvpr cvpr2022 deep-learning hierarchical-attention-networks high-resolution histopathology pretrained-weights pytorch self-supervised-learning transfer-learning unsupervised-learning vision-transformer weakly-supervised-learning

Last synced: 15 Dec 2024

https://github.com/mahmoodlab/HIPT

Hierarchical Image Pyramid Transformer - CVPR 2022 (Oral)

computational-pathology cvpr cvpr2022 deep-learning hierarchical-attention-networks high-resolution histopathology pretrained-weights pytorch self-supervised-learning transfer-learning unsupervised-learning vision-transformer weakly-supervised-learning

Last synced: 13 Nov 2024

https://github.com/Blaizzy/mlx-vlm

MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.

apple-silicon florence2 idefics llava llm local-ai mlx molmo paligemma pixtral vision-framework vision-language-model vision-transformer

Last synced: 25 Nov 2024

https://github.com/ViTAE-Transformer/ViTAE-Transformer-Remote-Sensing

A comprehensive list [SAMRS@NeurIPS'23, RVSA@TGRS'22, RSP@TGRS'22] of our research works related to remote sensing, including papers, codes, and citations. Note: The repo for [TGRS'22] "An Empirical Study of Remote Sensing Pretraining" has been moved to: https://github.com/ViTAE-Transformer/RSP

change-detection classification deep-learning object-detection remote-sensing self-supervised-learning semantic-segmentation transfer-learning vision-transformer

Last synced: 15 Nov 2024

https://github.com/vitae-transformer/vitae-transformer-remote-sensing

A comprehensive list [SAMRS@NeurIPS'23, RVSA@TGRS'22, RSP@TGRS'22] of our research works related to remote sensing, including papers, codes, and citations. Note: The repo for [TGRS'22] "An Empirical Study of Remote Sensing Pretraining" has been moved to: https://github.com/ViTAE-Transformer/RSP

change-detection classification deep-learning object-detection remote-sensing self-supervised-learning semantic-segmentation transfer-learning vision-transformer

Last synced: 14 Nov 2024

https://github.com/google-research/maxvit

[ECCV 2022] Official repository for "MaxViT: Multi-Axis Vision Transformer". SOTA foundation models for classification, detection, segmentation, image quality, and generative modeling...

architecture classification cnn computer-vision image image-processing mlp object-detection resnet segmentation transformer transformer-architecture vision-transformer

Last synced: 17 Nov 2024

https://github.com/NVlabs/GCVit

[ICML 2023] Official PyTorch implementation of Global Context Vision Transformers

ade20k backbone coco deep-learning imagenet imagenet-classification object-detection pre-train pre-trained-model self-attention semantic-segmentation vision-transformer visual-recognition

Last synced: 15 Nov 2024

https://github.com/vitae-transformer/remote-sensing-rvsa

The official repo for [TGRS'22] "Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model"

deep-learning foundation-model foundation-models object-detection pytorch remote-sensing remote-sensing-foundation-model scene-classification self-supervised-learning semantic-segmentation transfer-learning vision-transformer

Last synced: 15 Dec 2024

https://github.com/raoyongming/GFNet

[NeurIPS 2021] [T-PAMI] Global Filter Networks for Image Classification

computer-vision deep-learning image-classification image-recognition vision-transformer

Last synced: 15 Nov 2024

https://github.com/rentainhe/visualization

a collection of visualization function

attention attention-map attention-mechanism data-visualization deep-learning transformer vision vision-mlp vision-transformer visualization

Last synced: 21 Dec 2024

https://github.com/oneflow-inc/libai

LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training

data-parallelism deep-learning distributed-training large-scale model-parallelism nlp oneflow pipeline-parallelism self-supervised-learning transformer vision-transformer

Last synced: 15 Dec 2024

https://github.com/Oneflow-Inc/libai

LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training

data-parallelism deep-learning distributed-training large-scale model-parallelism nlp oneflow pipeline-parallelism self-supervised-learning transformer vision-transformer

Last synced: 16 Nov 2024

https://github.com/omerbt/splice

Official Pytorch Implementation for "Splicing ViT Features for Semantic Appearance Transfer" presenting "Splice" (CVPR 2022 Oral)

cvpr2022 generative-models image-translation single-image single-image-generation splice style-transfer vision-transformer

Last synced: 16 Dec 2024

https://github.com/omerbt/Splice

Official Pytorch Implementation for "Splicing ViT Features for Semantic Appearance Transfer" presenting "Splice" (CVPR 2022 Oral)

cvpr2022 generative-models image-translation single-image single-image-generation splice style-transfer vision-transformer

Last synced: 15 Nov 2024

https://github.com/asyml/vision-transformer-pytorch

Pytorch version of Vision Transformer (ViT) with pretrained models. This is part of CASL (https://casl-project.github.io/) and ASYML project.

pytorch vision-transformer

Last synced: 17 Dec 2024

https://github.com/hustvl/mimdet

[ICCV 2023] You Only Look at One Partial Sequence

computer-vision instance-segmentation mae masked-image-modeling object-detection transformer vision-transformer

Last synced: 16 Dec 2024

https://github.com/IBM/CrossViT

Official implementation of CrossViT. https://arxiv.org/abs/2103.14899

computer-vision deep-learning multi-scale-features vision-transformer

Last synced: 14 Nov 2024

https://github.com/shoufachen/adaptformer

[NeurIPS 2022] Implementation of "AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition"

adapter neurips-2022 recognition vision-transformer visual-adapter

Last synced: 20 Dec 2024

https://github.com/xmed-lab/CLIP_Surgery

CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks

clip explainability interpretability multilabel multimodal open-vocabulary sam segment-anything segmentation vision-transformer

Last synced: 27 Oct 2024

https://github.com/megvii-research/FQ-ViT

[IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer

imagenet post-training-quantization pytorch quantization vision-transformer

Last synced: 28 Oct 2024

https://github.com/ZhangGongjie/SAM-DETR

[CVPR'2022] SAM-DETR & SAM-DETR++: Official PyTorch Implementation

computer-vision cvpr cvpr2022 deep-learning detection detr machine-learning object-detection pytorch transformer vision vision-transformer

Last synced: 28 Oct 2024

https://github.com/roatienza/deep-text-recognition-benchmark

PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)

ocr str vision-transformer vitstr

Last synced: 15 Dec 2024

https://github.com/martinsbruveris/tensorflow-image-models

TensorFlow port of PyTorch Image Models (timm) - image models with pretrained weights.

imagenet tensorflow vision-transformer

Last synced: 15 Nov 2024

https://github.com/paddlepaddle/passl

PASSL包含 SimCLR，MoCo v1/v2，BYOL，CLIP，PixPro，simsiam, SwAV, BEiT，MAE 等图像自监督算法以及 Vision Transformer，DEiT，Swin Transformer，CvT，T2T-ViT，MLP-Mixer，XCiT，ConvNeXt，PVTv2 等基础视觉算法

beit clip convnext cvt deep-learning deit mae moco moco-v2 paddle pixpro pvt self-supervised-learning simclr swav swin-transformer vision-transformer vit xcit

Last synced: 21 Dec 2024

https://github.com/dwctod/eccv2022-papers-with-code-demo

收集 ECCV 最新的成果，包括论文、代码和demo视频等，欢迎大家推荐！

ai computer-vision cv dataset diffusion eccv eccv2022 face-recognition image-segmentation multimodal-deep-learning nerf objection-detection vision-transformer

Last synced: 21 Nov 2024

https://github.com/DerrickXuNu/v2x-vit

[ECCV2022] Official Implementation of paper "V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer"

3d-object-detection autonomous-driving collaborative-perception computer-vision deep-learning machine-learning multi-agent-system pytorch simulation v2x vehicle-to-everything vision-transformer

Last synced: 28 Oct 2024

https://github.com/jingyunliang/rvrt

Recurrent Video Restoration Transformer with Guided Deformable Attention (NeurlPS2022, official repository)

deblurring denoising low-level-vision restoraton sr super-resolution transformer video video-deblurring video-denoising video-restoration video-sr video-super-resolution vision-transformer

Last synced: 18 Dec 2024

https://github.com/Haiyang-W/GiT

[ECCV2024 Oral🔥] Official Implementation of "GiT: Towards Generalist Vision Transformer through Universal Language Interface"

foundation-models perception transformer unified vision-and-language vision-transformer

Last synced: 28 Oct 2024

https://github.com/vitae-transformer/vitae-transformer

The official repo for [NeurIPS'21] "ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias" and [IJCV'22] "ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond"

ade20k deep-learning imagenet imagenet-classification mscoco object-detection semantic-segmentation vision-transformer vitae-transformer

Last synced: 18 Dec 2024

https://github.com/staghado/vit.cpp

Inference Vision Transformer (ViT) in plain C/C++ with ggml

ai c computer-vision cpp cpu edge-computing ggml image-classification llamacpp vision-transformer whisper-cpp

Last synced: 17 Dec 2024

https://github.com/paddlepaddle/interpretdl

InterpretDL: Interpretation of Deep Learning Models，基于『飞桨』的模型可解释性算法库。

convolutional-neural-networks explanations grad-cam interpretation-algorithms lime model-interpretation nlp-models paddlepaddle smoothgrad vision-transformer visualizations

Last synced: 15 Dec 2024

https://github.com/PaddlePaddle/InterpretDL

InterpretDL: Interpretation of Deep Learning Models，基于『飞桨』的模型可解释性算法库。

convolutional-neural-networks explanations grad-cam interpretation-algorithms lime model-interpretation nlp-models paddlepaddle smoothgrad vision-transformer visualizations

Last synced: 17 Nov 2024

https://github.com/ziqipang/lm4visualencoding

[ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"

llm vision-transformer

Last synced: 18 Dec 2024

https://github.com/NVIDIA/transformer-ls

Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).

efficient-transformers long-sequence transformer vision-transformer

Last synced: 16 Nov 2024

https://github.com/nvidia/transformer-ls

Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).

efficient-transformers long-sequence transformer vision-transformer

Last synced: 29 Oct 2024

https://github.com/vitae-transformer/vitae-transformer-matting

A comprehensive list of our research works related to image matting, including papers, codes, datasets, demos, and citations. Note: The repo for [IJCV'23] "Rethinking Portrait Matting with Privacy Preserving" has been moved to: https://github.com/ViTAE-Transformer/P3M-Net

computer-vision deep-learning image-matting privacy-preserving survey vision-transformer

Last synced: 14 Nov 2024

https://github.com/AnshMittal1811/MachineLearning-AI

This repository contains all the work that I regularly did and studied from Medium blogs, several research papers, and other Repos (related/unrelated to the research papers).

3d-computer-vision audio-signal-processing computer-vision convolutional-neural-networks deep-learning deep-neural-networks generative-models gradcam graph-neural-networks image-classification lidar-point-cloud machine-learning neural-network neural-networks neural-radiance-fields neural-rendering pytorch transformers vision-transformer

Last synced: 28 Oct 2024

https://github.com/zhongkaifu/seq2seqsharp

Seq2SeqSharp is a tensor based fast & flexible deep neural network framework written by .NET (C#). It has many highlighted features, such as automatic differentiation, different network types (Transformer, LSTM, BiLSTM and so on), multi-GPUs supported, cross-platforms (Windows, Linux, x86, x64, ARM), multimodal model for text and images and so on.

attention-model cuda deep-learning encoder-decoder gpu image lstm machine-translation neural-network seq2seq sequence-to-sequence tensor text transformer transformer-architecture transformer-encoder translation vision-transformer