https://github.com/sarathir-dev/multi-scale-vit-for-3d-medical-imaging

A PyTorch implementation of a Multi-Scale Vision Transformer (ViT) for 3D medical image classification using the OrganMNIST3D dataset. This project explores multi-scale attention mechanisms to enhance classification performance in volumetric medical imaging.
https://github.com/sarathir-dev/multi-scale-vit-for-3d-medical-imaging

3d-imaging deep-learning healthcare medical-imaging medmnist multi-scale-model pytorch self-attention tranformers vision-transformer volumetric-data

Last synced: 8 months ago
JSON representation

Host: GitHub
URL: https://github.com/sarathir-dev/multi-scale-vit-for-3d-medical-imaging
Owner: sarathir-dev
Created: 2025-03-03T03:16:57.000Z (8 months ago)
Default Branch: main
Last Pushed: 2025-03-03T04:03:00.000Z (8 months ago)
Last Synced: 2025-03-03T04:26:13.009Z (8 months ago)
Topics: 3d-imaging, deep-learning, healthcare, medical-imaging, medmnist, multi-scale-model, pytorch, self-attention, tranformers, vision-transformer, volumetric-data
Language: Python
Homepage:
Size: 3.91 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Multi-Scale ViT for 3D Medical Imaging

A PyTorch implementation of a Multi-Scale Vision Transformer (ViT) for 3D medical image classification. This model utilizes self-attention and multi-scale feature extraction to analyze volumetric medical images using the OrganMNIST3D dataset.

## Features
- 3D Vision Transformer (ViT) for volumetric medical imaging
- Multi-Scale Attention Mechanism for feature extraction
- OrganMNIST3D dataset from MedMNIST
- Sinusoidal Positional Embeddings for 3D spatial encoding
- Patch-Based Image Tokenization
- GPU Acceleration (CUDA) for efficient training

## Dataset
This project uses the OrganMNIST3D dataset from MedMNIST, which consists of 3D grayscale volumes of 11 organ classes.

### Download the Dataset
The dataset is automatically downloaded using `medmnist`:

```python
from medmnist.dataset import OrganMNIST3D
train_dataset = OrganMNIST3D(split="train", download=True)
```

## Installation
Clone the repository and install dependencies:

```bash
git clone https://github.com/yourusername/Multi-Scale-ViT-3D.git](https://github.com/sarathir-dev/Multi-Scale-ViT-for-3D-Medical-Imaging.git)
cd Multi-Scale-ViT-3D
pip install -r requirements.txt
```

## Model Architecture
The model follows the Vision Transformer (ViT) architecture, adapted for 3D data:

- Patch Embedding: Converts 3D volumes into smaller patches
- Multi-Head Self-Attention (MHSA): Extracts global features
- Feed-Forward Networks (FFN): Enhances representation learning
- Classification Head: Outputs predictions for 11 organ classes

## Training

Run the training script:

```bash
python training/train.py
```

## Evaluation

To evaluate the model on the test dataset:

```bash
python training/test.py
```

## Results
The model is trained for 10 epochs using CrossEntropyLoss and the Adam optimizer. Performance is measured using accuracy on the test set.

## Contributing
Contributions are welcome. Please open an issue or pull request to improve the project.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sarathir-dev/multi-scale-vit-for-3d-medical-imaging

Awesome Lists containing this project

README