https://github.com/justin900429/deep-learning-template
A lightweight Python template for deep learning project or research with PyTorch.
https://github.com/justin900429/deep-learning-template
configs deep-learning lightweight multi-gpu-training python pytorch template
Last synced: 10 months ago
JSON representation
A lightweight Python template for deep learning project or research with PyTorch.
- Host: GitHub
- URL: https://github.com/justin900429/deep-learning-template
- Owner: Justin900429
- License: mit
- Created: 2024-04-30T12:59:02.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2025-01-05T15:23:30.000Z (over 1 year ago)
- Last Synced: 2025-04-04T00:02:10.226Z (about 1 year ago)
- Topics: configs, deep-learning, lightweight, multi-gpu-training, python, pytorch, template
- Language: Python
- Homepage:
- Size: 86.9 KB
- Stars: 5
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Deep Learning Project's Template
## π Introduction
Welcome to our Deep Learning Project Template, crafted for researchers and developers working with PyTorch. This template is designed to streamline the setup, execution, and modification of deep learning experiments, allowing you to focus more on model development and less on boilerplate code.
## β¨ Features
1. **Multi-GPU Support:** Utilize the power of multiple GPUs or devices to accelerate your training using [accelerate](https://github.com/huggingface/accelerate).
2. **Flexible Configuration:** Easily configure your experiments with the [tyro]([https:](https://github.com/brentyi/tyro)) configuration system, enabling easy to use and type validation.
3. **Clear Architecture:** Our template is structured for clarity and ease of use, ensuring you can understand and modify the code with minimal effort.
4. **Transparent Training Process:** Enjoy a clear display of the training process, helping you monitor performance and make necessary tweaks in real-time.
5. **Using uv for better and faster package management:** We adopt [uv](https://docs.astral.sh/uv/getting-started/installation/) for better package management which is written in Rust.
## π Folder Structure
Our project is organized as follows to help you navigate and manage the codebase effectively:
```plaintext
π¦deep-learning-template
βββ πconfigs # Configuration files for experiments
β βββ πconfig_utils.py # Utils for showning or saving configs
β βββ πconfig.py # Main configuraiton script
βββ πconfiguration # Configuration files for experiments
β βββ πcifar
β β βββ cifar_big.json # Configuration for a larger model (example)
β β βββ cifar_small.json # Configuration for a smaller model (example)
βββ πdataset # Modules for data handling
β βββ πdata_loader.py # Data loader script
βββ πmodeling # Neural network models and loss functions
β βββ πmodel.py # Example model file
βββ πutils # Utility scripts for various tasks
β βββ πlogger.py # Logging utilities
β βββ πmetrics.py # Performance metrics
βββ πengine # Utility scripts for various tasks
β βββ πbase_engine.py # Base engine class for repeat tasks
β βββ πengine.py # Training functions here
βββ π.gitignore # Specifies intentionally untracked files to ignore
βββ πLICENSE # License file for the project
βββ πREADME.md # README file with project details
βββ πlinter.sh # Shell script for formating the code
βββ πrequirements.txt # Dependencies and libraries
βββ πmain.py # Starting point for training
```
## βοΈ Configuration (requires update)
Configure your models and training setups with ease. Modify the `config.py` file to suit your experimental needs. Our system uses [YACS](https://github.com/rbgirshick/yacs), which allows for a hierarchical configuration with overrides for command-line options. The recommeneded structure we used:
```python
# Basic setup of the project
cfg = CN()
cfg._BASE_ = None
cfg.PROJECT_DIR = None
cfg.PROJECT_LOG_WITH = ["tensorboard"]
# Control the modeling settings
cfg.MODEL = CN()
# ...
# Control the loss settings
cfg.LOSS = CN()
# ...
# Control the dataset settings (e.g., path)
cfg.DATA = CN()
# ...
# Control the training setup (e.g., lr, epoch)
cfg.TRAIN = CN()
# ...
# Control the training setup (e.g., batch size)
cfg.EVAL = CN()
# ...
```
## ποΈββοΈ Training (requires update)
### Basic Usage
To start a training, run:
```shell
python engine.py --config configs/your_config.yaml
# Concrete example
python traing.py --config configs/cifar/cifar-small.yaml
```
After the training start, users can find the training folder called `logs`. To modify the default setting, please change the option `log_dir`. Followed by `logs` is the `project_dir` defined in the config file.
```plaintext
π¦{LOG_DIR}/{PROJECT_DIR}
βββ πcheckpoint # Folder for saving checkpoints
βββ π... # Other files setup by tracker(s)
```
### Override the config with command line
Users can override the options with the `--opts` flag. For instance, to resume the training:
```shell
python engine.py --config configs/your_config.yaml --opts TRAIN.RESUME_CHECKPOINT path/to/checkpoint
# Concrete example
python engine.py --config configs/cifar/cifar-small.yaml --opts TRAIN.RESUME_CHECKPOINT logs/cifar-small/checkpoint/best_model_epoch_10.pth
```
Please check the config setup section for more details.
### Multi-GPU Training
This project template is made based on [accelerate](https://github.com/huggingface/accelerate) to provide multi-GPU training. A simple example to train a model with 2 GPUs:
```shell
accelerate launch --multi_gpu --num_processes=2 engine.py --config configs/your_config.json --opts (optional)
# Concrete example
accelerate launch --multi_gpu --num_processes=2 engine.py --config configs/cifar/cifar-small.json
```
### Tracker
Trackers such as `tensorboard` and `wandb` can be setup from the `project_log_with` option. We support multiple trackers at once through accelerate! Users are encouraged to find our which is the best for the project from [here](https://huggingface.co/docs/accelerate/usage_guides/tracking). Below are some examples to open the local monitor:
```shell
# tensorboard
tensorboard --logdir logs
```
## π How to Add Your Code?
1. **Integrating New Models:** Place your model files in the `modeling/` folder and update the configurations accordingly.
2. **Adding New Datasets:** Implement data handling in the `dataset/` folder and reference it in your config files.
3. **Utility Scripts:** Enhance functionality by adding utility scripts in the `utils/` folder.
4. **Customized Training Process**: Please change the `engine/engine.py` to modify the training process.
## TODO
- [ ] Support iteration based training with infinite loader.
## π Special Thanks
Thanks to the creators of:
- [accelerate](https://github.com/huggingface/accelerate)
- [YACS](https://github.com/rbgirshick/yacs)
- [L1aoXingyu](https://github.com/L1aoXingyu/Deep-Learning-Project-Template)
- [victoresque](https://github.com/victoresque/pytorch-template)
- [tyro](https://github.com/brentyi/tyro)
Feel free to modify and adapt this README to better fit the specifics and details of your project.