https://github.com/dyth/dyth

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/dyth/dyth
Owner: dyth
Created: 2021-06-12T20:12:27.000Z (about 4 years ago)
Default Branch: main
Last Pushed: 2025-01-28T23:49:35.000Z (6 months ago)
Last Synced: 2025-01-29T00:29:56.637Z (6 months ago)
Size: 83 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        ### David Yu-Tung Hui, 許宇同

I am an independent researcher interested in Deep Reinforcement Learning.

My research focuses on increasing the optimization stability of off-policy gradient-based $Q$-learning algorithms over a range of tasks and hyperparameters.

I'm especially interested in developing algorithms to solve continuous control tasks.

I've written two works along this research direction:

1. **Stabilizing Q-Learning for Continuous Control**  

David Yu-Tung Hui  

MSc Thesis, University of Montreal, 2022  

I showed that using LayerNorm in the critic of DDPG prevented divergence during training in MuJoCo and DeepMind Control continuous control environments, enabling non-trivial behaviors to be learned in the dog-run task of DeepMind Control.  

[[.pdf]](https://papyrus.bib.umontreal.ca/xmlui/bitstream/handle/1866/32085/Hui_David_Yu-Tung_2022_memoire.pdf)

[[Errata]](https://gist.github.com/dyth/0324b7a4c2ca4b0f3bab18583b5dc22b)

3. **Double Gumbel Q-Learning**  

David Yu-Tung Hui, Aaron Courville, Pierre-Luc Bacon  

Spotlight at NeurIPS 2023  

We modeled noise introduced by a function approximator in $Q$-learning as a heteroscedastic Gumbel distribution and derived a loss function from this noise model that was effective in off-policy continuous control -- our resultant algorithm achieved ~2x the aggregate performance of SAC after 1M training timesteps.  

[[.pdf]](https://proceedings.neurips.cc/paper_files/paper/2023/file/07956d40074d6523bad11112b3225c6e-Paper-Conference.pdf)

[[Reviews]](https://openreview.net/forum?id=UdaTyy0BNB)

[[Poster (.png)]](https://nips.cc/media/PosterPDFs/NeurIPS%202023/71497.png)

[[5-min talk]](https://slideslive.com/39009623/double-gumbel-qlearning)

[[1-hour seminar]](https://www.youtube.com/watch?v=GMNtHLA3bAE)

[[Code (GitHub)]](https://github.com/dyth/doublegum)

[[Errata]](https://gist.github.com/dyth/0abd5c5b87184144854a431437de7d44)

In 2023, I graduated with an MSc from Mila, University of Montreal.

I'm looking for opportunities where I can continue my research.

For more information about me, see my [Google Scholar](https://scholar.google.com/citations?user=pXHOdMwAAAAJ&hl=en).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dyth/dyth

Awesome Lists containing this project

README