https://github.com/ipitio/bronte
Deep Learning Playground: A modular and extensible framework
https://github.com/ipitio/bronte
deep-learning etl framework pytorch
Last synced: 10 months ago
JSON representation
Deep Learning Playground: A modular and extensible framework
- Host: GitHub
- URL: https://github.com/ipitio/bronte
- Owner: ipitio
- License: agpl-3.0
- Created: 2023-12-15T00:08:48.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2024-01-17T00:43:59.000Z (over 2 years ago)
- Last Synced: 2025-02-13T08:52:22.134Z (over 1 year ago)
- Topics: deep-learning, etl, framework, pytorch
- Language: HTML
- Homepage:
- Size: 54 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Bronte

[](https://www.gnu.org/licenses/agpl-3.0)
`bronte` is a modular and extensible Deep Learning framework; It views a model not as layers, but a trainer of layers, whose preprocessing and evaluation are task-specific. Like with `Pytorch Lightning`, this abstracts away training and allows for a clean separation of concerns, making it easy to modify, add, and experiment with different tasks and architectures. If you'd like to add a new one, you can do so by creating a new class in the appropriate module and adding it in `bronte`.
It is composed of the following modules:
- `bronte`: Factory and Driver
- `arch`: Layers and Forward Pass
- `task`: Preprocessing and Evaluation
- `base`: Training and Inference
- `data`: Datasets
- `loss`: Loss calculations
- `tune`: Hyperparameter tuning
`Bronte` the class takes a dictionary of options, including the names of a task and an arch, and creates a model. When data is passed to `Bronte`, it splits it into features X and target(s) y, and passes these to the model's `fit` method, which then initializes the layers, optimizer, scheduler, criterion, scaler, datasets, and dataloaders, and starts training. Please look at the notebook for a list of all currently supported options (under Deep Learning > Options).
> **Note**
>
> You must initialize the layers not in `__init__`, but in `init_layers`, as this is used to (re)initialize the model's layers when (resuming) training.
## Usage
### Training
import bronte
data = [df]
models = [task | arch]
# load data into tables
for df in data:
bronte.load(df)
# start tensorboard
bronte.track()
# train models on tables, returning list of Bronte objects
trainers = bronte.fit(models)
# call again to stop tensorboard
bronte.track()
# flush db
bronte.flush()
### Inference
import bronte
XX = [X]
paths = ["models/.../model.pt"]
# predict on list of new data, returning dict: {path: {str(XX.index(X)): y}}
predictions = bronte.predict(XX, paths)
## Supports
- Training:
- (C/G/T)PU
- Persistence
- Mixed Precision
- Multi input and output
- Model and state checkpointing
- Learning Rate scheduling
- Transfer Learning
- Gradient accumulation and scaling
- Parallel and Distributed with `dask`
- Hyperparameter tuning with `optuna`
- Calculating feature importances with `shap`
- Monitoring/Logging with `tensorboard`
- Tasks:
- Regression
- Classification
- Architectures:
- FFN
- RNN with Attention
## TODO
- [ ] Frontend + Flask
- [ ] More archs, tasks
- [ ] Tests, Typing, Documentation
## Example
The notebook `basketball.ipynb` runs an ETL Pipeline for a sample dataset of Basketball statistics and performs Deep Learning using `Bronte`.
### ETL Pipeline
First, the data is extracted (from CSVs in this case), merged, and examined (ie. EDA) with `ydata-profiling`. Then it's transformed with some standard cleaning and dataset-specific feature engineering, before being partitioned into small chunks and loaded into a database.
### Deep Learning
This database is then read table-by-table, for each task and arch specified, and passed to `Bronte`. Over the course of training, checkpoints of state and visuals of metrics and importances will be saved to `models/`. Once training is complete, `Bronte` can be used to load the trained models and make predictions on new data.