Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/chenin-wang/pytorch-deep-learning-template
a standardized project structure designed to accelerate the development of deep learning models using the PyTorch framework.
https://github.com/chenin-wang/pytorch-deep-learning-template
accelerate deep-learning deepspeed pytorch train transformers
Last synced: 2 days ago
JSON representation
a standardized project structure designed to accelerate the development of deep learning models using the PyTorch framework.
- Host: GitHub
- URL: https://github.com/chenin-wang/pytorch-deep-learning-template
- Owner: chenin-wang
- Created: 2024-09-12T07:52:55.000Z (21 days ago)
- Default Branch: main
- Last Pushed: 2024-09-14T03:54:24.000Z (19 days ago)
- Last Synced: 2024-10-01T17:01:46.553Z (2 days ago)
- Topics: accelerate, deep-learning, deepspeed, pytorch, train, transformers
- Language: Jupyter Notebook
- Homepage:
- Size: 1.19 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# PyTorch Deep Learning Template
development is ongoing. Please stay tuned for updates.A clean and modular template to kickstart your next deep learning project ππ
## Key Features
- **Modularity**: Logical components separated into different Python submodules
- **Ready to Go**: Uses [transformers](https://github.com/huggingface/transformers) and [accelerate](https://github.com/huggingface/accelerate) to eliminate boilerplate code
- **Customizable**: Easily swap models, loss functions, and optimizers
- **Logging**: Utilizes Python's [logging](https://docs.python.org/3/library/logging.html) module
- **Experiment Tracking**: Integrates [Weights & Biases](https://www.wandb.ai) for comprehensive experiment monitoring
- **Metrics**: Uses [torchmetrics](https://github.com/Lightning-AI/metrics) for efficient metric computation and [evaluate](https://github.com/huggingface/evaluate) for multi-metric model evaluation
- **Playground**: Jupyter notebook for quick experimentation and prototyping## Key Components
### Project Structure
- Maintain a clean and modular structure
- Define paths and constants in a central location (e.g. `Project.py`)
- Use `pathlib.Path` for cross-platform compatibility### Data Processing
- Implement custom datasets by subclassing `torch.utils.data.Dataset`
- Define data transformations in `data/transformations/`
- Use `get_dataloaders()` to configure train/val/test loaders### Modeling
- Define models in the `models/` directory
- Implement custom architectures or modify existing ones as needed### Training and Evaluation
- Utilize `main.py` for training/evaluation logic
- Leverage libraries like Accelerate for distributed training
- Implement useful callbacks:
- Learning rate scheduling
- Model checkpointing
- Early stopping### Logging and Experiment Tracking
- Use Python's `logging` module for consistent logging
- Integrate experiment tracking (e.g. Weights & Biases, MLflow)### Utilities
- Implement helper functions for visualization, profiling, etc.
- Store in `utils/` directory## Best Practices
- Avoid hardcoding paths - use a centralized configuration
- Modularize code for reusability and maintainability
- Leverage existing libraries and tools when possible
- Document code and maintain a clear project structure
- Use version control and create reproducible experiments## Getting Started
1. Clone the repository
2. Install dependencies: `pip install -r requirements.txt`
3. Modify `Project.py` with your paths/constants
4. Implement your custom dataset/model as needed
5. Run training: `python main.py`This template addresses the common challenge of unstructured and hard-to-maintain code in data science projects. It provides a clean, modular structure that promotes scalability and shareability. The example project demonstrates image classification using a fine-tuned ResNet18 model on a Star Wars character dataset.
## Project Structure
The project is structured in a modular way, with separate folders for data processing, modeling, training, and utilities. The `Project` class in `Project.py` stores paths and constants that are used throughout the codebase.
## Data Processing
Data processing is handled by the `get_dataloaders()` function in `data/datasets.py`. It takes in the dataset name and splits it into train/val/test sets using a predefined split ratio. Transforms can be applied to each set as needed.
## Modeling
Models are defined in `models/modeling_torch.py`. This file contains the implementation of a simple CNN architecture for image classification. You can modify or add your own models here.
## Training
Training is handled by the `train()` function in `train.py`. It takes in the model, dataloaders, and training parameters, and trains the model using the specified optimizer and loss function.
## Utilities
Utilities such as logging, saving, and loading models are handled by the `utils.py` file. This file contains functions for saving and loading models, as well as logging training progress.
## Example Usage
To train the model, run the following command:
```bash
python main.py
```This will train the model using the specified parameters and save the trained model to the output directory.
To load and evaluate a pre-trained model, run the following command:
```bash
python main.py --evaluate --model_path /path/to/pretrained/model
```This will load the pre-trained model and evaluate its performance on the test set.
## Architecture
```bash
.
β .gitignore
β main.py # main script to run the project
β playground.ipynb # a notebook to play around with the code
β README.md
β requirements.txt
β test.py
β train.sh
β
ββcallbacks # Callbacks for training and logging
β CometCallback.py
β __init__.py
β
ββconfigs # Config files
β config.yaml
β ds_zero2_no_offload.json
β
ββdata # Data module
β β DataLoader.py
β β Dataset.py
β β __init__.py
β β
β ββtransformations
β transforms.py
β __init__.py
β
ββloggers # Logging module
β β logging_colors.py
β
ββlosses # Losses module
β loss.py
β __init__.py
β
ββmetrics # Metrics module
β metric.py
β __init__.py
β
ββmodels # Models module
β β modelutils.py
β β __init__.py
β β
β ββHFModel # HuggingFace models
β β configuration_hfmodel.py
β β convert_hfmodel_original_pytorch_to_hf.py
β β feature_extraction_hfmodel.py # audio processing
β β image_processing_hfmodel.py # Image processing
β β modeling_hfmodel.py # Modeling
β β processing_hfmodel.py # mutimodal processing
β β tokenization_hfmodel.py # Tokenization
β β tokenization_hfmodel_fast.py
β β __init__.py
β β
β ββTorchModel # Torch models
β modeling_torch.py
β utils.py
β __init__.py
β
ββonnx # ONNX module
β converter2onnx.py
β
ββtrainer # Trainer module
β acclerate.py
β arguments.py
β evaluater.py
β inference.py
β trainer.py
β __init__.py
β
ββutils
constants.py
profiler.py # Profiling module
utils.py```