https://github.com/follgad/hpong
in progress
https://github.com/follgad/hpong
Last synced: 4 months ago
JSON representation
in progress
- Host: GitHub
- URL: https://github.com/follgad/hpong
- Owner: FOLLGAD
- Created: 2024-12-02T16:03:44.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-08T23:39:32.000Z (over 1 year ago)
- Last Synced: 2025-05-08T01:09:31.502Z (about 1 year ago)
- Language: Python
- Homepage:
- Size: 85.4 MB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# hPong (WIP)


## Doing:
- [ ] implement **temporal** position embedding using rope
## Todo:
- [ ] Read up on rotary position embeddings in transformers, look at example implementations
- [ ] implement **spacial** position embedding using rope
- [ ] make vae training output images after each epoch
- [ ] Figure out why KL divergence of the DiT is so high (in the 9 orders of magnitude)
- [ ] guidance for user actions on DiT (STGuidance)
Done:
- [x] Split each frame into its own VAE encoding/decoding.
- [x] in the dataset, the "player" paddle should sometimes play like a good player for data quality
# References
- [RoPE: Rotary Position Embedding](https://arxiv.org/pdf/2104.09864v5)
- [Spatiotemporal Skip Guidance](https://arxiv.org/pdf/2411.18664)
- [Classifier-free Guidance](https://arxiv.org/pdf/2207.12598)
- [Axial Attention](https://arxiv.org/pdf/1912.12180)