Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/chenin-wang/pytorch-deep-learning-template

a standardized project structure designed to accelerate the development of deep learning models using the PyTorch framework.
https://github.com/chenin-wang/pytorch-deep-learning-template

accelerate deep-learning deepspeed pytorch train transformers

Last synced: 2 days ago
JSON representation

a standardized project structure designed to accelerate the development of deep learning models using the PyTorch framework.

Awesome Lists containing this project

README

        

# PyTorch Deep Learning Template
development is ongoing. Please stay tuned for updates.

A clean and modular template to kickstart your next deep learning project πŸš€πŸš€

## Key Features

- **Modularity**: Logical components separated into different Python submodules
- **Ready to Go**: Uses [transformers](https://github.com/huggingface/transformers) and [accelerate](https://github.com/huggingface/accelerate) to eliminate boilerplate code
- **Customizable**: Easily swap models, loss functions, and optimizers
- **Logging**: Utilizes Python's [logging](https://docs.python.org/3/library/logging.html) module
- **Experiment Tracking**: Integrates [Weights & Biases](https://www.wandb.ai) for comprehensive experiment monitoring
- **Metrics**: Uses [torchmetrics](https://github.com/Lightning-AI/metrics) for efficient metric computation and [evaluate](https://github.com/huggingface/evaluate) for multi-metric model evaluation
- **Playground**: Jupyter notebook for quick experimentation and prototyping

## Key Components

### Project Structure
- Maintain a clean and modular structure
- Define paths and constants in a central location (e.g. `Project.py`)
- Use `pathlib.Path` for cross-platform compatibility

### Data Processing
- Implement custom datasets by subclassing `torch.utils.data.Dataset`
- Define data transformations in `data/transformations/`
- Use `get_dataloaders()` to configure train/val/test loaders

### Modeling
- Define models in the `models/` directory
- Implement custom architectures or modify existing ones as needed

### Training and Evaluation
- Utilize `main.py` for training/evaluation logic
- Leverage libraries like Accelerate for distributed training
- Implement useful callbacks:
- Learning rate scheduling
- Model checkpointing
- Early stopping

### Logging and Experiment Tracking
- Use Python's `logging` module for consistent logging
- Integrate experiment tracking (e.g. Weights & Biases, MLflow)

### Utilities
- Implement helper functions for visualization, profiling, etc.
- Store in `utils/` directory

## Best Practices

- Avoid hardcoding paths - use a centralized configuration
- Modularize code for reusability and maintainability
- Leverage existing libraries and tools when possible
- Document code and maintain a clear project structure
- Use version control and create reproducible experiments

## Getting Started

1. Clone the repository
2. Install dependencies: `pip install -r requirements.txt`
3. Modify `Project.py` with your paths/constants
4. Implement your custom dataset/model as needed
5. Run training: `python main.py`

This template addresses the common challenge of unstructured and hard-to-maintain code in data science projects. It provides a clean, modular structure that promotes scalability and shareability. The example project demonstrates image classification using a fine-tuned ResNet18 model on a Star Wars character dataset.

## Project Structure

The project is structured in a modular way, with separate folders for data processing, modeling, training, and utilities. The `Project` class in `Project.py` stores paths and constants that are used throughout the codebase.

## Data Processing

Data processing is handled by the `get_dataloaders()` function in `data/datasets.py`. It takes in the dataset name and splits it into train/val/test sets using a predefined split ratio. Transforms can be applied to each set as needed.

## Modeling

Models are defined in `models/modeling_torch.py`. This file contains the implementation of a simple CNN architecture for image classification. You can modify or add your own models here.

## Training

Training is handled by the `train()` function in `train.py`. It takes in the model, dataloaders, and training parameters, and trains the model using the specified optimizer and loss function.

## Utilities

Utilities such as logging, saving, and loading models are handled by the `utils.py` file. This file contains functions for saving and loading models, as well as logging training progress.

## Example Usage

To train the model, run the following command:

```bash
python main.py
```

This will train the model using the specified parameters and save the trained model to the output directory.

To load and evaluate a pre-trained model, run the following command:

```bash
python main.py --evaluate --model_path /path/to/pretrained/model
```

This will load the pre-trained model and evaluate its performance on the test set.

## Architecture

```bash
.
β”‚ .gitignore
β”‚ main.py # main script to run the project
β”‚ playground.ipynb # a notebook to play around with the code
β”‚ README.md
β”‚ requirements.txt
β”‚ test.py
β”‚ train.sh
β”‚
β”œβ”€callbacks # Callbacks for training and logging
β”‚ CometCallback.py
β”‚ __init__.py
β”‚
β”œβ”€configs # Config files
β”‚ config.yaml
β”‚ ds_zero2_no_offload.json
β”‚
β”œβ”€data # Data module
β”‚ β”‚ DataLoader.py
β”‚ β”‚ Dataset.py
β”‚ β”‚ __init__.py
β”‚ β”‚
β”‚ └─transformations
β”‚ transforms.py
β”‚ __init__.py
β”‚
β”œβ”€loggers # Logging module
β”‚ β”‚ logging_colors.py
β”‚
β”œβ”€losses # Losses module
β”‚ loss.py
β”‚ __init__.py
β”‚
β”œβ”€metrics # Metrics module
β”‚ metric.py
β”‚ __init__.py
β”‚
β”œβ”€models # Models module
β”‚ β”‚ modelutils.py
β”‚ β”‚ __init__.py
β”‚ β”‚
β”‚ β”œβ”€HFModel # HuggingFace models
β”‚ β”‚ configuration_hfmodel.py
β”‚ β”‚ convert_hfmodel_original_pytorch_to_hf.py
β”‚ β”‚ feature_extraction_hfmodel.py # audio processing
β”‚ β”‚ image_processing_hfmodel.py # Image processing
β”‚ β”‚ modeling_hfmodel.py # Modeling
β”‚ β”‚ processing_hfmodel.py # mutimodal processing
β”‚ β”‚ tokenization_hfmodel.py # Tokenization
β”‚ β”‚ tokenization_hfmodel_fast.py
β”‚ β”‚ __init__.py
β”‚ β”‚
β”‚ └─TorchModel # Torch models
β”‚ modeling_torch.py
β”‚ utils.py
β”‚ __init__.py
β”‚
β”œβ”€onnx # ONNX module
β”‚ converter2onnx.py
β”‚
β”œβ”€trainer # Trainer module
β”‚ acclerate.py
β”‚ arguments.py
β”‚ evaluater.py
β”‚ inference.py
β”‚ trainer.py
β”‚ __init__.py
β”‚
└─utils
constants.py
profiler.py # Profiling module
utils.py

```