https://github.com/aveygo/visiondiff
Simple vision based block for Microsoft's Differential Transformer
https://github.com/aveygo/visiondiff
Last synced: 2 months ago
JSON representation
Simple vision based block for Microsoft's Differential Transformer
- Host: GitHub
- URL: https://github.com/aveygo/visiondiff
- Owner: Aveygo
- License: gpl-3.0
- Created: 2024-10-16T10:58:57.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2024-10-21T22:55:49.000Z (7 months ago)
- Last Synced: 2025-02-25T05:16:14.642Z (3 months ago)
- Language: Python
- Size: 138 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# VisionDiff
Using the [Differential Transformer](https://arxiv.org/abs/2410.05258) in a vision-friendly way, similar to [VisionMamba](https://github.com/kyegomez/VisionMamba).
![]()
## Preamble
Absolutely no formal experimentation has been performed to validate this method - take everything with a grain of salt.## Installation
```pip install visiondiff```
## Usage
```python
from VisionDiff import VisionDiff
dim, num_heads = 32, 4
layer1 = VisionDiff(dim, num_heads, in_channels=3)
layer2 = VisionDiff(dim, num_heads)
layer3 = VisionDiff(dim, num_heads, out_channels=3)x = torch.zeros(1, 3, 64, 64) # Example "image"
x = layer1(x)
x = layer2(x)
x = layer3(x)print(f"Output shape: {x.shape}") # [1, 3, 64, 64]
```Please keep in_channel / out_channel operations at a minimum
as they tend to be computationally expensive.## Versions
0.1.0 - *Now with more positional encoding!*