https://github.com/codingfisch/tinygym
Reinforcement learning in tinygrad
https://github.com/codingfisch/tinygym
Last synced: 7 months ago
JSON representation
Reinforcement learning in tinygrad
- Host: GitHub
- URL: https://github.com/codingfisch/tinygym
- Owner: codingfisch
- License: mit
- Created: 2025-03-06T18:22:38.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-03-06T18:24:47.000Z (8 months ago)
- Last Synced: 2025-03-06T19:26:41.228Z (8 months ago)
- Size: 2.93 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# tinygym
`tinygym` reimplements [`flashrl`](https://github.com/codingfisch/flashrl), while using [`tinygrad`](https://github.com/tinygrad/tinygrad) instead of [`torch`](https://github.com/pytorch/pytorch)
🛠️ `pip install tinygym` or clone the repo & `pip install -r requirements.txt`
- If cloned (or if envs changed), compile: `python setup.py build_ext --inplace`
The [README of `flashrl`](https://github.com/codingfisch/flashrl) is mostly valid for `tinygym`, with the biggest difference being:
- **`tinygym` is not fast (yet)** -> Learns Pong in ~5 minutes instead of 5 seconds (on a RTX 3090)
Just like in `flashrl`, `python train.py` should look like this (with the progress bar moving ~60x slower):
Check out the `onefile` branch, if you want to make it fast(=try to make `TinyJit` work)!
# Implementation differences to `flashrl`
The **most important difference** (enabled RL after 2 hours of debugging):
- **Use `.abs().clip(min_=1e-8)` in `ppo` to avoid close to zero values in `(value - ret)`**
Without this, the optimizer step can result in NaNs and **"RL doesn't work"** 😜
To potentially enable `tinygrad.TinyJit` (does not work yet, hence the slowness)
- `Learner` does not `.setup_data` and
- `rollout` is a function (instead of a `Learner` method) that fills a list with Tensors and `.stack`s them at the end
Since it somehow performs better
- `.uniform` (`tinygrad` default) instead of `.kaiming_uniform` (`torch` default) weight initialization for `nn.Linear`
Custom `tinygrad` rewrites of `torch.nn.init.orthogonal_` & `torch.nn.utils.clip_grad_norm_`are used
You'll find a `.detach()` here and a `.contiguous()` there, but other than that `tinygym`=`flashrl` 🤝
## Acknowledgements 🙌
I want to thank
- [George Hotz](https://github.com/geohot) and the tinygrad team for commoditizing the petaflop! Star [tinygrad](https://github.com/tinygrad/tinygrad) ⭐
- [Andrej Karpathy](https://github.com/karpathy) for commoditizing RL knowledge! Star [pg-pong](https://gist.github.com/karpathy/a4166c7fe253700972fcbc77e4ea32c5) ⭐
and last but not least...