Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with vision-transformer

A curated list of projects in awesome lists tagged with vision-transformer .

https://github.com/nielsrogge/transformers-tutorials

This repository contains demos I made with the Transformers library by HuggingFace.

bert gpt-2 layoutlm pytorch transformers vision-transformer

Last synced: 16 Dec 2024

https://github.com/NielsRogge/Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.

bert gpt-2 layoutlm pytorch transformers vision-transformer

Last synced: 30 Oct 2024

https://github.com/adithya-s-k/omniparse

Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks

ingestion-api ocr omniparser parse-server parser-library vision-transformer web-crawler whisper-api

Last synced: 17 Dec 2024

https://github.com/foundationvision/var

[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

auto-regressive-model autoregressive-models diffusion-models generative-ai generative-model gpt gpt-2 image-generation large-language-models neurips transformers vision-transformer

Last synced: 17 Dec 2024

https://github.com/FoundationVision/VAR

[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

auto-regressive-model autoregressive-models diffusion-models generative-ai generative-model gpt gpt-2 image-generation large-language-models neurips transformers vision-transformer

Last synced: 04 Nov 2024

https://github.com/google-research/scenic

Scenic: A Jax Library for Computer Vision Research and Beyond

attention computer-vision deep-learning jax research transformers vision-transformer

Last synced: 17 Dec 2024

https://github.com/baaivision/eva

EVA Series: Visual Representation Fantasies from BAAI

foundation-models representation-learning vision-transformer

Last synced: 19 Dec 2024

https://github.com/baaivision/EVA

EVA Series: Visual Representation Fantasies from BAAI

foundation-models representation-learning vision-transformer

Last synced: 28 Oct 2024

https://github.com/hila-chefer/transformer-explainability

[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.

attention-matrix attention-visualization bert bert-model cvpr2021 deep-learning explainability perturbation transformer-interpretability vision-transformer visualize-classifications vit

Last synced: 21 Dec 2024

https://github.com/hila-chefer/Transformer-Explainability

[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.

attention-matrix attention-visualization bert bert-model cvpr2021 deep-learning explainability perturbation transformer-interpretability vision-transformer visualize-classifications vit

Last synced: 30 Oct 2024

https://github.com/microsoft/cream

This is a collection of our NAS and Vision Transformer work.

automl efficiency knowledge-distillation nas rpe vision-transformer vit-compression

Last synced: 19 Dec 2024

https://github.com/microsoft/Cream

This is a collection of our NAS and Vision Transformer work.

automl efficiency knowledge-distillation nas rpe vision-transformer vit-compression

Last synced: 05 Nov 2024

https://github.com/vitae-transformer/vitpose

The official repo for [NeurIPS'22] "ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation" and [TPAMI'23] "ViTPose++: Vision Transformer for Generic Body Pose Estimation"

deep-learning distillation mae pose-estimation pytorch self-supervised-learning vision-transformer

Last synced: 19 Dec 2024

https://github.com/czczup/vit-adapter

[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions

adapter object-detection semantic-segmentation vision-transformer

Last synced: 15 Dec 2024

https://github.com/MCG-NJU/VideoMAE

[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

action-recognition mae masked-autoencoder neurips-2022 pytorch self-supervised-learning transformer video-analysis video-representation-learning video-transformer video-understanding vision-transformer

Last synced: 27 Oct 2024

https://github.com/czczup/ViT-Adapter

[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions

adapter object-detection semantic-segmentation vision-transformer

Last synced: 04 Nov 2024

https://github.com/emcf/thepipe

Extract clean data from anywhere, powered by vision-language models ⚡

gpt-4 gpt-4o large-language-models multimodal pdf scrapers vision-transformer web

Last synced: 19 Dec 2024

https://github.com/yitu-opensource/T2T-ViT

ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

t2t-transformer vision-transformer vit

Last synced: 13 Nov 2024

https://github.com/OFA-Sys/ONE-PEACE

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

audio-language contrastive-loss foundation-models multimodal representation-learning vision-and-language vision-language vision-transformer

Last synced: 29 Nov 2024

https://github.com/jacobgil/vit-explain

Explainability for Vision Transformers

deep-learning explainable-ai pytorch transformer vision-transformer

Last synced: 20 Dec 2024

https://github.com/hustvl/yolos

[NeurIPS 2021] You Only Look at One Sequence

computer-vision object-detection transformer vision-transformer

Last synced: 18 Dec 2024

https://github.com/hustvl/YOLOS

[NeurIPS 2021] You Only Look at One Sequence

computer-vision object-detection transformer vision-transformer

Last synced: 09 Nov 2024

https://github.com/xxxnell/how-do-vits-work

(ICLR 2022 Spotlight) Official PyTorch implementation of "How Do Vision Transformers Work?"

loss-landscape pytorch self-attention transformer vision-transformer

Last synced: 15 Nov 2024

https://github.com/NVlabs/FasterViT

[ICLR 2024] Official PyTorch implementation of FasterViT: Fast Vision Transformers with Hierarchical Attention

ade20k backbone coco deep-learning foundation-models image-classification image-net object-detection pre-trained-model self-attention semantic-segmentation vision-transformer visual-recognition

Last synced: 28 Oct 2024

https://github.com/sunzey/alphaclip

[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

deep-learning machine-learning vision-and-language vision-language vision-language-model vision-transformer

Last synced: 21 Dec 2024

https://github.com/Alibaba-MIIL/ImageNet21K

Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(NeurIPS, 2021) paper

downstream imagenet21k mixer multi-label-classification pretraining semantic-softmax single-label vision-transformer

Last synced: 26 Oct 2024

https://github.com/baudm/parseq

Scene Text Recognition with Permuted Autoregressive Sequence Models (ECCV 2022)

computer-vision eccv eccv2022 ocr optical-character-recognition scene-text-recognition text-recognition vision-transformer

Last synced: 20 Dec 2024

https://github.com/blaizzy/mlx-vlm

MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.

apple-silicon florence2 idefics llava llm local-ai mlx molmo paligemma pixtral vision-framework vision-language-model vision-transformer

Last synced: 19 Dec 2024

https://github.com/mv-lab/swin2sr

[ECCV] Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration. Advances in Image Manipulation (AIM) workshop ECCV 2022. Try it out! over 3.3M runs https://replicate.com/mv-lab/swin2sr

compression compression-artifact-reduction computer-vision deblocking deep-learning denoising eccv2022 image-denoising image-processing image-restoration image-sr image-super-resolution jpeg low-level-vision ntire super-resolution swin2sr swinir transformer vision-transformer

Last synced: 06 Nov 2024

https://github.com/vitae-transformer/vitdet

Unofficial implementation for [ECCV'22] "Exploring Plain Vision Transformer Backbones for Object Detection"

deep-learning object-detection pytorch vision-transformer

Last synced: 15 Dec 2024

https://github.com/ViTAE-Transformer/ViTDet

Unofficial implementation for [ECCV'22] "Exploring Plain Vision Transformer Backbones for Object Detection"

deep-learning object-detection pytorch vision-transformer

Last synced: 15 Nov 2024

https://github.com/jdai-cv/cotnet

This is an official implementation for "Contextual Transformer Networks for Visual Recognition".

contextual-transformer cotnet image-classification imagenet instance-segmentation mask-rcnn mscoco object-detection semantic-segmentation vision-transformer

Last synced: 15 Dec 2024

https://github.com/Blaizzy/mlx-vlm

MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.

apple-silicon florence2 idefics llava llm local-ai mlx molmo paligemma pixtral vision-framework vision-language-model vision-transformer

Last synced: 25 Nov 2024

https://github.com/ViTAE-Transformer/ViTAE-Transformer-Remote-Sensing

A comprehensive list [SAMRS@NeurIPS'23, RVSA@TGRS'22, RSP@TGRS'22] of our research works related to remote sensing, including papers, codes, and citations. Note: The repo for [TGRS'22] "An Empirical Study of Remote Sensing Pretraining" has been moved to: https://github.com/ViTAE-Transformer/RSP

change-detection classification deep-learning object-detection remote-sensing self-supervised-learning semantic-segmentation transfer-learning vision-transformer

Last synced: 15 Nov 2024

https://github.com/vitae-transformer/vitae-transformer-remote-sensing

A comprehensive list [SAMRS@NeurIPS'23, RVSA@TGRS'22, RSP@TGRS'22] of our research works related to remote sensing, including papers, codes, and citations. Note: The repo for [TGRS'22] "An Empirical Study of Remote Sensing Pretraining" has been moved to: https://github.com/ViTAE-Transformer/RSP

change-detection classification deep-learning object-detection remote-sensing self-supervised-learning semantic-segmentation transfer-learning vision-transformer

Last synced: 14 Nov 2024

https://github.com/google-research/maxvit

[ECCV 2022] Official repository for "MaxViT: Multi-Axis Vision Transformer". SOTA foundation models for classification, detection, segmentation, image quality, and generative modeling...

architecture classification cnn computer-vision image image-processing mlp object-detection resnet segmentation transformer transformer-architecture vision-transformer

Last synced: 17 Nov 2024

https://github.com/raoyongming/GFNet

[NeurIPS 2021] [T-PAMI] Global Filter Networks for Image Classification

computer-vision deep-learning image-classification image-recognition vision-transformer

Last synced: 15 Nov 2024

https://github.com/omerbt/splice

Official Pytorch Implementation for "Splicing ViT Features for Semantic Appearance Transfer" presenting "Splice" (CVPR 2022 Oral)

cvpr2022 generative-models image-translation single-image single-image-generation splice style-transfer vision-transformer

Last synced: 16 Dec 2024

https://github.com/omerbt/Splice

Official Pytorch Implementation for "Splicing ViT Features for Semantic Appearance Transfer" presenting "Splice" (CVPR 2022 Oral)

cvpr2022 generative-models image-translation single-image single-image-generation splice style-transfer vision-transformer

Last synced: 15 Nov 2024

https://github.com/asyml/vision-transformer-pytorch

Pytorch version of Vision Transformer (ViT) with pretrained models. This is part of CASL (https://casl-project.github.io/) and ASYML project.

pytorch vision-transformer

Last synced: 17 Dec 2024

https://github.com/IBM/CrossViT

Official implementation of CrossViT. https://arxiv.org/abs/2103.14899

computer-vision deep-learning multi-scale-features vision-transformer

Last synced: 14 Nov 2024

https://github.com/shoufachen/adaptformer

[NeurIPS 2022] Implementation of "AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition"

adapter neurips-2022 recognition vision-transformer visual-adapter

Last synced: 20 Dec 2024

https://github.com/xmed-lab/CLIP_Surgery

CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks

clip explainability interpretability multilabel multimodal open-vocabulary sam segment-anything segmentation vision-transformer

Last synced: 27 Oct 2024

https://github.com/megvii-research/FQ-ViT

[IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer

imagenet post-training-quantization pytorch quantization vision-transformer

Last synced: 28 Oct 2024

https://github.com/roatienza/deep-text-recognition-benchmark

PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)

ocr str vision-transformer vitstr

Last synced: 15 Dec 2024

https://github.com/martinsbruveris/tensorflow-image-models

TensorFlow port of PyTorch Image Models (timm) - image models with pretrained weights.

imagenet tensorflow vision-transformer

Last synced: 15 Nov 2024

https://github.com/paddlepaddle/passl

PASSL包含 SimCLR,MoCo v1/v2,BYOL,CLIP,PixPro,simsiam, SwAV, BEiT,MAE 等图像自监督算法以及 Vision Transformer,DEiT,Swin Transformer,CvT,T2T-ViT,MLP-Mixer,XCiT,ConvNeXt,PVTv2 等基础视觉算法

beit clip convnext cvt deep-learning deit mae moco moco-v2 paddle pixpro pvt self-supervised-learning simclr swav swin-transformer vision-transformer vit xcit

Last synced: 21 Dec 2024

https://github.com/DerrickXuNu/v2x-vit

[ECCV2022] Official Implementation of paper "V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer"

3d-object-detection autonomous-driving collaborative-perception computer-vision deep-learning machine-learning multi-agent-system pytorch simulation v2x vehicle-to-everything vision-transformer

Last synced: 28 Oct 2024

https://github.com/jingyunliang/rvrt

Recurrent Video Restoration Transformer with Guided Deformable Attention (NeurlPS2022, official repository)

deblurring denoising low-level-vision restoraton sr super-resolution transformer video video-deblurring video-denoising video-restoration video-sr video-super-resolution vision-transformer

Last synced: 18 Dec 2024

https://github.com/Haiyang-W/GiT

[ECCV2024 Oral🔥] Official Implementation of "GiT: Towards Generalist Vision Transformer through Universal Language Interface"

foundation-models perception transformer unified vision-and-language vision-transformer

Last synced: 28 Oct 2024

https://github.com/vitae-transformer/vitae-transformer

The official repo for [NeurIPS'21] "ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias" and [IJCV'22] "ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond"

ade20k deep-learning imagenet imagenet-classification mscoco object-detection semantic-segmentation vision-transformer vitae-transformer

Last synced: 18 Dec 2024

https://github.com/staghado/vit.cpp

Inference Vision Transformer (ViT) in plain C/C++ with ggml

ai c computer-vision cpp cpu edge-computing ggml image-classification llamacpp vision-transformer whisper-cpp

Last synced: 17 Dec 2024

https://github.com/paddlepaddle/interpretdl

InterpretDL: Interpretation of Deep Learning Models,基于『飞桨』的模型可解释性算法库。

convolutional-neural-networks explanations grad-cam interpretation-algorithms lime model-interpretation nlp-models paddlepaddle smoothgrad vision-transformer visualizations

Last synced: 15 Dec 2024

https://github.com/PaddlePaddle/InterpretDL

InterpretDL: Interpretation of Deep Learning Models,基于『飞桨』的模型可解释性算法库。

convolutional-neural-networks explanations grad-cam interpretation-algorithms lime model-interpretation nlp-models paddlepaddle smoothgrad vision-transformer visualizations

Last synced: 17 Nov 2024

https://github.com/ziqipang/lm4visualencoding

[ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"

llm vision-transformer

Last synced: 18 Dec 2024

https://github.com/NVIDIA/transformer-ls

Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).

efficient-transformers long-sequence transformer vision-transformer

Last synced: 16 Nov 2024

https://github.com/nvidia/transformer-ls

Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).

efficient-transformers long-sequence transformer vision-transformer

Last synced: 29 Oct 2024

https://github.com/vitae-transformer/vitae-transformer-matting

A comprehensive list of our research works related to image matting, including papers, codes, datasets, demos, and citations. Note: The repo for [IJCV'23] "Rethinking Portrait Matting with Privacy Preserving" has been moved to: https://github.com/ViTAE-Transformer/P3M-Net

computer-vision deep-learning image-matting privacy-preserving survey vision-transformer

Last synced: 14 Nov 2024

https://github.com/zhongkaifu/seq2seqsharp

Seq2SeqSharp is a tensor based fast & flexible deep neural network framework written by .NET (C#). It has many highlighted features, such as automatic differentiation, different network types (Transformer, LSTM, BiLSTM and so on), multi-GPUs supported, cross-platforms (Windows, Linux, x86, x64, ARM), multimodal model for text and images and so on.

attention-model cuda deep-learning encoder-decoder gpu image lstm machine-translation neural-network seq2seq sequence-to-sequence tensor text transformer transformer-architecture transformer-encoder translation vision-transformer

Last synced: 21 Dec 2024

https://github.com/vitae-transformer/qformer

The official repo for [TPAMI'23] "Vision Transformer with Quadrangle Attention"

attention-mechanism backbone classification deep-learning object-detection pose-estimation semantic-segmentation vision-transformer

Last synced: 19 Dec 2024

https://github.com/chou141253/FGVC-PIM

Pytorch implementation for "A Novel Plug-in Module for Fine-Grained Visual Classification". fine-grained visual classification task.

efficientnet fgvc fine-grained-visual-categorization resnet swin-transformer vision-transformer

Last synced: 05 Nov 2024

https://github.com/SforAiDl/vformer

A modular PyTorch library for vision transformer models

pytorch vision-transformer

Last synced: 15 Nov 2024

https://github.com/bfshi/AbSViT

Official code for "Top-Down Visual Attention from Analysis by Synthesis" (CVPR 2023 highlight)

attention classification cvpr pytorch segmentation vision-transformer

Last synced: 09 Nov 2024

https://github.com/kyegomez/vit-rgts

Open source implementation of "Vision Transformers Need Registers"

attention-mechanism gpt4 vision-api vision-transformer vit

Last synced: 20 Dec 2024