https://github.com/bchao1/pong-policy-gradient
🏓 Train an AI player to play Pong using Policy Gradient.
https://github.com/bchao1/pong-policy-gradient
Last synced: 3 months ago
JSON representation
🏓 Train an AI player to play Pong using Policy Gradient.
- Host: GitHub
- URL: https://github.com/bchao1/pong-policy-gradient
- Owner: bchao1
- Created: 2019-01-05T10:13:58.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2019-01-20T12:12:26.000Z (over 6 years ago)
- Last Synced: 2025-06-25T10:52:17.311Z (3 months ago)
- Language: Python
- Homepage:
- Size: 10.5 MB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Policy Gradient: Pong
The green pad is our actor, and it achieves an average reward over 30 episodes of 13.1.
||||||
|---|---|---|---|---|
||||||## Settings
### Preprocessing
The frames (orignially of size 210 * 160) are converted to grayscale then directly resized to 80 * 80. The differential frame (current frame - previous frame) is flattened to a one-dimensional vector of length 6400 and fed into the actor network.
- Other tries
- Cropped frame (removed scoreboard), subsampled frame with factor of 2, then computed the differential frame.### Model Architecture
- Baseline Model
- Fully connected (6400, 256), no bias
- RelU
- Fully connected (256, 256), no bias
- ReLU
- Fully connected (256, 1), no bias
- SigmoidThe dimension of the action space of the gym-Pong environment is 3 (up, down, doesn't move). We reduced the action space to 2 (up, down), hence using sigmoid at the output layer is sufficient.
### Other settings
- Optimizer: Adam, betas = (0.9, 0.999), learning rate = 0.0001.
- Gradient is accumulated every 10 episodes and then used to upadate the network to stabilize training process.
- Rewards are discounted with factor 0.99, and then normalized (substracted by their mean and then divided by their standard deviation).
### ResultsThe model is trained for 46 hours, achieving an average reward over 30 episodes of 13.1
