https://github.com/iandanforth/preference
Preference is an action selection method. It is an alternative to softmax, greedy, epsilon-greedy etc.
https://github.com/iandanforth/preference
Last synced: about 2 months ago
JSON representation
Preference is an action selection method. It is an alternative to softmax, greedy, epsilon-greedy etc.
- Host: GitHub
- URL: https://github.com/iandanforth/preference
- Owner: iandanforth
- Created: 2018-08-06T21:20:04.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2018-08-25T21:28:24.000Z (almost 8 years ago)
- Last Synced: 2026-04-03T04:16:47.677Z (3 months ago)
- Language: Jupyter Notebook
- Size: 11.4 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Softmax Action Detection Visualization
[Visualization Demo](https://iandanforth.github.io/smaction/)
[Description of Softmax Action Selection](http://www.incompleteideas.net/book/ebook/node17.html)
The impact of temperature (tau) in the softmax equation on the probability of an action being selected may not be immediately obvious.
This visualization is a simple way to see that impact.
### Things to try
- Set temperature = 1
- Set the value of 'a' near the value of 'b'. Notice how small changes in value in this regime have large impacts.
- Set temperature to 1000 and try again.
- Try to fully recover the equiprobable action selection policy.