An open API service indexing awesome lists of open source software.

https://github.com/refrakt-hub/refrakt_core

The core implementation repository behind Refrakt.
https://github.com/refrakt-hub/refrakt_core

deep-learning pytorch re-implementation

Last synced: 4 months ago
JSON representation

The core implementation repository behind Refrakt.

Awesome Lists containing this project

README

          

# About

**refrakt_core** is a modular deep learning and machine learning research framework for computer vision, designed for rapid experimentation, extensibility, and reproducibility. It now features a robust, thread-safe registry system, dynamic dataset handling, advanced image resizing, flexible hyperparameter overrides, and comprehensive logging and testing. Refrakt supports both classic and modern CV/ML papers, and enables seamless ML/DL/fusion pipelines.

> This project aims to unify, extend, and visualize foundational and modern architectures through clean code, clear abstractions, and rigorous logging.

## ๐Ÿš€ Key Features

- **Safe Registry System**: Thread-safe, import-safe, decorator-based registration for models, datasets, losses, trainers, and transforms. Backward compatible with legacy code.
- **Dynamic Dataset Loader**: Load datasets from custom zip files or torchvision, with automatic format detection (GAN, supervised, contrastive) and size validation.
- **Standard Image Resizer/Transforms**: Multiple resize strategies (maintain aspect, crop, stretch), size validation, and tensor/PIL support.
- **Hyperparameter Overrides**: Override any config parameter from the command line or programmatically for fast experimentation.
- **Improved Logging**: Context-aware logging with better error handling, supporting both TensorBoard and Weights & Biases (W&B).
- **Comprehensive Testing**: Smoke, sanity, unit, and integration tests for all major features.
- **ML/DL/Fusion Pipelines**: Support for pure-ML, pure-DL, and hybrid fusion pipelines (e.g., deep feature extraction + ML fusion head).
- **Modular YAML Configs**: All components (model, trainer, loss, optimizer, scheduler, feature engineering) are defined in modular YAML files.

## ๐Ÿ“š Implemented Papers

- [Vision Transformer (ViT)](https://arxiv.org/abs/2010.11929) โ€“ *An Image is Worth 16x16 Words*
- [ResNet](https://arxiv.org/abs/1512.03385) โ€“ *Deep Residual Learning for Image Recognition*
- [Autoencoders](https://www.cs.toronto.edu/~hinton/science.pdf) โ€“ *Learning Representations via Reconstruction*
- [Swin Transformer](https://arxiv.org/abs/2103.14030) โ€“ *Hierarchical Vision Transformer with Shifted Windows*
- [Attention is All You Need](https://arxiv.org/abs/1706.03762)
- [ConvNeXt](https://arxiv.org/abs/2201.03545) โ€“ *A ConvNet for the 2020s*
- [SRGAN](https://arxiv.org/abs/1609.04802) โ€“ *Photo-Realistic Single Image Super-Resolution with GANs*
- [SimCLR](https://arxiv.org/abs/2002.05709) โ€“ *A Simple Framework for Contrastive Learning*
- [DINO](https://arxiv.org/abs/2104.14294) โ€“ *Self-Supervised Vision Transformers*
- [MAE](https://arxiv.org/abs/2111.06377) โ€“ *Masked Autoencoders*
- [MSN](https://arxiv.org/abs/2204.07141) โ€“ *Masked Siamese Networks*

## โš™๏ธ Setup
```bash
# For pip install
pip install refrakt_core
```

```bash
# Manual setup
git clone https://github.com/refrakt-hub/refrakt_core.git
cd refrakt_core

# Create and activate a virtual environment
conda create -n refrakt python=3.10 -y
conda activate refrakt

# Install dependencies
pip install -r requirements.txt
```

### GPU/cuML Support

If you want to use GPU-accelerated ML features (cuML), you must manually install the required dependencies after the main install. Run one of the following scripts from the project root:

```bash
# For bash users:
./install_cuml.sh

# For fish shell users:
./install_cuml.fish
```

This will install the appropriate cuML and RAPIDS libraries for your environment. If you do not need GPU/cuML support, you can skip this step.

## ๐Ÿ”ง Config Structure (YAML)

All components are defined in modular YAML files under `refrakt_core/config/`.

```yaml
runtime:
mode: pipeline
log_type: []

dataset:
name: MNIST
params:
root: ./data
train: true
download: true
transform:
- name: Resize
params: { size: [28, 28] }
- name: ToTensor
- name: Normalize
params:
mean: [0.1307]
std: [0.3081]

dataloader:
params:
batch_size: 32
shuffle: true
num_workers: 4
drop_last: false

model:
name: vit
wrapper: vit
params:
in_channels: 1
num_classes: 10
image_size: 28
patch_size: 7
fusion:
type: cuml
model: logistic_regression
params:
C: 1.0
penalty: l2
solver: qn
max_iter: 1000

loss:
name: ce_wrapped
mode: logits
params: {}

optimizer:
name: adamw
params:
lr: 0.0003

scheduler: null

trainer:
name: supervised
params:
save_dir: "./checkpoints"
num_epochs: 1
device: cuda
```

## ๐Ÿงฉ Major Components & Patterns

### 1. Safe Registry System

Register models, datasets, losses, trainers, and transforms using decorators:

```python
from refrakt_core.registry.safe_registry import register_model, get_model

@register_model("my_model")
class MyModel(torch.nn.Module):
...

model_cls = get_model("my_model")
model = model_cls()
```

### 2. Dynamic Dataset Loader

Load datasets from zip files or torchvision, with format detection:

```python
from refrakt_core.loaders.dataset_loader import load_dataset
train_dataset, val_dataset = load_dataset("path/to/dataset.zip")
train_dataset, val_dataset = load_dataset("mnist")
```

### 3. Standard Image Resizer/Transforms

```python
from refrakt_core.resizers.standard_transforms import create_standard_transform
transform = create_standard_transform(target_size=(224, 224), resize_strategy="maintain_aspect")
```

### 4. Hyperparameter Overrides

Override any config value from the command line or programmatically:

```bash
python train.py --config config.yaml model.name=ResNet optimizer.lr=0.001
```

### 5. ML/DL/Fusion Pipelines

Supports pure-ML, pure-DL, and hybrid fusion pipelines (deep features + ML head):

```python
from refrakt_core.api.builders.model_builder import build_model
model = build_model(cfg=config, modules=modules, device="cuda", overrides=["model.params.lr=0.0005"])
```

## ๐Ÿ“ˆ Logging & Monitoring

- **TensorBoard**: logs in `logs//tensorboard/`
- **Weights & Biases**: auto-logged if enabled in config

```bash
tensorboard --logdir=./logs//tensorboard/
export WANDB_API_KEY=your_key_here
```

## ๐Ÿงฑ Project Structure

```
refrakt_core/
โ”œโ”€โ”€ api/ # CLI: train.py, test.py, inference.py
โ”‚ โ””โ”€โ”€ builders/ # Builders for models, losses, optimizers, datasets
โ”œโ”€โ”€ config/ # YAML configurations for each experiment
โ”œโ”€โ”€ losses/ # Contrastive, GAN, MAE, VAE, etc.
โ”œโ”€โ”€ models/ # Vision architectures (ViT, ResNet, MAE, etc.)
โ”‚ โ””โ”€โ”€ templates/ # Base model templates and abstractions
โ”œโ”€โ”€ trainer/ # Task-specific training logic (SimCLR, SRGAN, etc.)
โ”œโ”€โ”€ registry/ # Safe, decorator-based plugin system
โ”œโ”€โ”€ utils/ # Helper modules (encoders, decoders, data classes)
โ”œโ”€โ”€ resizers/ # Image resizing and standard transforms
โ”œโ”€โ”€ loaders/ # Dynamic and standard dataset loaders
โ”œโ”€โ”€ transforms.py # Data augmentation logic
โ”œโ”€โ”€ datasets.py # Dataset definitions and loader helpers
โ”œโ”€โ”€ logging_config.py # Logger wrapper for stdout + W&B/TensorBoard
```

## ๐Ÿงช Testing

Run all tests:
```bash
pytest tests/
```

## ๐Ÿงฉ Extending Refrakt

### Add a New Model

1. Create the architecture in `models/your_model.py`
2. Inherit from a base class in `models/templates/models.py`
3. Register it using:

```python
from refrakt_core.registry.model_registry import register_model

@register_model("your_model")
class YourModel(BaseClassifier):
...
```

4. Add a YAML config: `config/your_model.yaml`
5. Write a custom trainer if needed (`trainer/your_model.py`)

### Add a Custom Dataset Loader or Transform
- Implement in `loaders/` or `resizers/`
- Register with the safe registry

## ๐Ÿ” Example Output

- Progress bar (via `tqdm`)
- Metrics printed and logged
- `./logs//` with TensorBoard events
- W&B dashboard if enabled

## ๐Ÿ“ฌ Contributing

1. Clone and install:
```bash
git clone ...
pip install -r requirements-dev.txt
pre-commit install
```
2. Follow formatting (`black`, `isort`, `pylint`)
3. Write tests for any new feature
4. Run:
```bash
pytest tests/
```

> PRs and issues are welcome!

## ๐Ÿ”ญ Future Scope

| Milestone | Description |
| ---------- | ------------------------------------------------------- |
| โœ… Stage 1 | Paper re-implementations in notebooks |
| โœ… Stage 2 | Modular training + model pipelines |
| โœ… Stage 3 | Python library (`refrakt train`, etc.) |
| ๐Ÿ”œ Stage 4 | TBD |

Planned additions:
- Much better code readability + extensive documentation (`readthedocs`)
- More sklearn and cuML models made available through the registry.
- Integration of Kolmogorov-Arnold Networks and Lagrangian Neural Networks.
- Checkpoints for pre-trained weights of models saved.
- Integrate model tracing for Fusion Blocks.
- Allow for generative / latent fusion trainng.

## ๐Ÿ“„ License

This repository is licensed under the MIT License. See [LICENSE](LICENSE) for full details.

## ๐Ÿ‘ค Maintainer

**Akshath Mangudi**
If you find issues, raise them. If you learn from this, share it.
Built with love and curiosity :)

## ๐Ÿค Contributing

We welcome contributions! To get started:

- See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines, including development setup, code style, and testing.
- Set up your dev environment with:
```bash
pip install -e .[dev]
# or
python scripts/dev_setup.py
```
- This will install all runtime and development dependencies (testing, linting, formatting, type checking, etc.) and set up pre-commit hooks for code quality.
- Please ensure your code passes all pre-commit checks and tests before opening a pull request.

---