https://github.com/aveygo/visiondiff

Simple vision based block for Microsoft's Differential Transformer
https://github.com/aveygo/visiondiff

Last synced: 4 months ago
JSON representation

Simple vision based block for Microsoft's Differential Transformer

Host: GitHub
URL: https://github.com/aveygo/visiondiff
Owner: Aveygo
License: gpl-3.0
Created: 2024-10-16T10:58:57.000Z (9 months ago)
Default Branch: main
Last Pushed: 2024-10-21T22:55:49.000Z (9 months ago)
Last Synced: 2025-02-25T05:16:14.642Z (5 months ago)
Language: Python
Size: 138 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # VisionDiff

Using the [Differential Transformer](https://arxiv.org/abs/2410.05258) in a vision-friendly way, similar to [VisionMamba](https://github.com/kyegomez/VisionMamba).



    



## Preamble

Absolutely no formal experimentation has been performed to validate this method - take everything with a grain of salt.

## Installation

```pip install visiondiff```

## Usage

```python

from VisionDiff import VisionDiff

dim, num_heads = 32, 4

layer1 = VisionDiff(dim, num_heads, in_channels=3)

layer2 = VisionDiff(dim, num_heads)

layer3 = VisionDiff(dim, num_heads, out_channels=3)

x = torch.zeros(1, 3, 64, 64) # Example "image"

x = layer1(x)

x = layer2(x)

x = layer3(x)

print(f"Output shape: {x.shape}") # [1, 3, 64, 64]

```

Please keep in_channel / out_channel operations at a minimum

as they tend to be computationally expensive.

## Versions

0.1.0 - *Now with more positional encoding!*

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/aveygo/visiondiff

Awesome Lists containing this project

README