https://github.com/swamikannan/cliffwalk

Cliffwalk to compare SARSA and Q-Learning
https://github.com/swamikannan/cliffwalk

cliffwalking python3 q-learning q-learning-algorithm q-learning-vs-sarsa sarsa-learning

Last synced: 3 months ago
JSON representation

Cliffwalk to compare SARSA and Q-Learning

Host: GitHub
URL: https://github.com/swamikannan/cliffwalk
Owner: SwamiKannan
License: mit
Created: 2022-09-05T05:13:12.000Z (almost 4 years ago)
Default Branch: main
Last Pushed: 2022-10-25T14:40:40.000Z (over 3 years ago)
Last Synced: 2025-06-06T11:04:31.893Z (about 1 year ago)
Topics: cliffwalking, python3, q-learning, q-learning-algorithm, q-learning-vs-sarsa, sarsa-learning
Language: Jupyter Notebook
Homepage:
Size: 1.7 MB
Stars: 0
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: Readme.md
- License: LICENSE

Awesome Lists containing this project

README

This is a series of projects where I solve RL environments by building RL algorithms from scratch using Python, Pytorch and Tensorflow

# Exercise
Compare the SARSA and the Q-learning algorithms using the GridWorld Cliff walking environment
# CliffWalk
![Cliff Walking representation](cliff_walking.png "Cliff Walking")
## Environment:
This is a simple implementation of the Gridworld Cliff reinforcement learning task.
Adapted from Example 6.6 (page 106) from Reinforcement Learning: An Introduction by Sutton and Barto: http://incompleteideas.net/book/bookdraft2018jan1.pdf

With inspiration from: https://github.com/dennybritz/reinforcement-learning/blob/master/lib/envs/cliff_walking.py

The board is a 4x12 matrix, with (using NumPy matrix indexing):

   o [3, 0] as the start at bottom-left

   o [3, 11] as the goal at bottom-right

   o [3, 1..10] as the cliff at bottom-center

Each time step incurs -1 reward, and stepping into the cliff incurs -100 reward and a reset to the start. An episode terminates when the agent reaches the goal.

From Sutton and Barto's Reinforcement Learning: An Introduction textbook

Example 6.6: Cliff Walking This gridworld example compares Sarsa and Q-learning, highlighting the difference between on-policy (Sarsa) and off-policy (Q-learning) methods. Consider the gridworld shown below. This is a standard undiscounted, episodic task, with start and goal states, and the usual actions causing movement up, down,right, and left. Reward is -1 on all transitions except those into the region marked “The Cliff”. Stepping into this region incurs a reward of -100 and sends the agent instantly back to the start.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/swamikannan/cliffwalk

Awesome Lists containing this project

README