https://github.com/kyegomez/multi-model-training
An experimental repository on research for training multiple models all at once in an evolutionary capacity!
https://github.com/kyegomez/multi-model-training
ai cats mamba ml pytorch ssms tensorflow training transformers
Last synced: 5 days ago
JSON representation
An experimental repository on research for training multiple models all at once in an evolutionary capacity!
- Host: GitHub
- URL: https://github.com/kyegomez/multi-model-training
- Owner: kyegomez
- License: mit
- Created: 2024-07-08T03:38:16.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-12-21T12:50:55.000Z (10 months ago)
- Last Synced: 2024-12-31T15:58:31.758Z (9 months ago)
- Topics: ai, cats, mamba, ml, pytorch, ssms, tensorflow, training, transformers
- Language: Python
- Homepage: https://discord.com/servers/agora-999382051935506503
- Size: 2.17 MB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
[](https://discord.com/servers/agora-999382051935506503)
# Multi-Model Trainers
Over the past year, Agora has implemented and created thousands of models all with slightly different variations and architectures. So a question arose, which was: How can we train multiple models all at once and then evaluate them and then distribute more GPU memory to the models that are learning the fastest with the lowest loss. There are many configurations we'll attempt in the future, but this is super experimental! If you want to hack on this, Join us at Agora and let's accelerate!## Install
```bash
$ pip install multi-model-trainers
```## Usage
```python
import torch
import torch.nn as nn
from loguru import loggerfrom multi_model_trainers.main import MultiModelMemoryTrainer
# Example usage
if __name__ == "__main__":
# Create some dummy models
models = [
nn.Sequential(nn.Linear(10, 50), nn.ReLU(), nn.Linear(50, 1))
for _ in range(3)
]
initial_allocation = [1 / 3, 1 / 3, 1 / 3]
total_memory = 4 * 1024 * 1024 * 1024 # 4 GBgpu_allocator = MultiModelMemoryTrainer(
models, initial_allocation, total_memory
)# Simulate a few training steps
for step in range(5):
logger.info(f"Training step {step}")# Generate dummy data
train_data = {
"inputs": torch.rand(32, 10),
"targets": torch.rand(32, 1),
}losses = gpu_allocator.train_step(train_data)
# Update learning rates based on losses (this is a simplistic approach)
learning_rates = [1 / (loss + 1e-5) for loss in losses]
gpu_allocator.update_learning_rates(learning_rates)# Reallocate GPU memory
gpu_allocator.reallocate_gpu_memory()# Validation step
val_data = {
"inputs": torch.rand(64, 10),
"targets": torch.rand(64, 1),
}
val_losses = gpu_allocator.validate(val_data)logger.info("Training complete")
```# License
MIT