https://github.com/codingfisch/tinygym

Reinforcement learning in tinygrad
https://github.com/codingfisch/tinygym

Last synced: 7 months ago
JSON representation

Reinforcement learning in tinygrad

Host: GitHub
URL: https://github.com/codingfisch/tinygym
Owner: codingfisch
License: mit
Created: 2025-03-06T18:22:38.000Z (8 months ago)
Default Branch: main
Last Pushed: 2025-03-06T18:24:47.000Z (8 months ago)
Last Synced: 2025-03-06T19:26:41.228Z (8 months ago)
Size: 2.93 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # tinygym

`tinygym` reimplements [`flashrl`](https://github.com/codingfisch/flashrl), while using [`tinygrad`](https://github.com/tinygrad/tinygrad) instead of [`torch`](https://github.com/pytorch/pytorch)

🛠️ `pip install tinygym` or clone the repo & `pip install -r requirements.txt`

  - If cloned (or if envs changed), compile: `python setup.py build_ext --inplace`

The [README of `flashrl`](https://github.com/codingfisch/flashrl) is mostly valid for `tinygym`, with the biggest difference being:

  - **`tinygym` is not fast (yet)** -> Learns Pong in ~5 minutes instead of 5 seconds (on a RTX 3090)

Just like in `flashrl`, `python train.py` should look like this (with the progress bar moving ~60x slower):



  



Check out the `onefile` branch, if you want to make it fast(=try to make `TinyJit` work)!

# Implementation differences to `flashrl`

The **most important difference** (enabled RL after 2 hours of debugging):

- **Use `.abs().clip(min_=1e-8)` in `ppo` to avoid close to zero values in `(value - ret)`**

Without this, the optimizer step can result in NaNs and **"RL doesn't work"** 😜

To potentially enable `tinygrad.TinyJit` (does not work yet, hence the slowness)

- `Learner` does not `.setup_data` and

- `rollout` is a function (instead of a `Learner` method) that fills a list with Tensors and `.stack`s them at the end

Since it somehow performs better

- `.uniform` (`tinygrad` default) instead of `.kaiming_uniform` (`torch` default) weight initialization for `nn.Linear`

Custom `tinygrad` rewrites of `torch.nn.init.orthogonal_` & `torch.nn.utils.clip_grad_norm_`are used

You'll find a `.detach()` here and a `.contiguous()` there, but other than that `tinygym`=`flashrl` 🤝

## Acknowledgements 🙌

I want to thank

- [George Hotz](https://github.com/geohot) and the tinygrad team for commoditizing the petaflop! Star [tinygrad](https://github.com/tinygrad/tinygrad) ⭐

- [Andrej Karpathy](https://github.com/karpathy) for commoditizing RL knowledge! Star [pg-pong](https://gist.github.com/karpathy/a4166c7fe253700972fcbc77e4ea32c5) ⭐

and last but not least...

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/codingfisch/tinygym

Awesome Lists containing this project

README