https://github.com/addy1997/rl-algorithms

This repository has RL algorithms implemented using python
https://github.com/addy1997/rl-algorithms

double-expected-sarsa double-sarsa epsilon-greedy-exploration expected-sarsa gradient-bandits hacktoberfest hacktoberfest2020 monte-carlo-methods q-learning q-learning-vs-sarsa reinforcement-learning rl-algorithms sarsa

Last synced: 6 months ago
JSON representation

This repository has RL algorithms implemented using python

Host: GitHub
URL: https://github.com/addy1997/rl-algorithms
Owner: addy1997
License: mit
Created: 2020-06-24T07:27:24.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2020-10-18T08:20:08.000Z (almost 5 years ago)
Last Synced: 2025-03-26T05:51:12.275Z (7 months ago)
Topics: double-expected-sarsa, double-sarsa, epsilon-greedy-exploration, expected-sarsa, gradient-bandits, hacktoberfest, hacktoberfest2020, monte-carlo-methods, q-learning, q-learning-vs-sarsa, reinforcement-learning, rl-algorithms, sarsa
Language: Jupyter Notebook
Homepage: http://Adwait1997.github.io/RL-Algorithms
Size: 1.3 MB
Stars: 7
Watchers: 1
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          


  



[![Software License](https://img.shields.io/badge/license-MIT-brightgreen.svg)](LICENSE)  [![Build Status](https://ci.appveyor.com/api/projects/status/8e784doc5sye7c41?svg=true)](https://ci.appveyor.com/project/addy1997/RL-Algorithms)  [![Stars](https://img.shields.io/github/stars/addy1997/RL-Algorithms.svg?style=flat&label=Star&maxAge=86400)](STARS) [![Contributions](https://img.shields.io/github/commit-activity/m/addy1997/RL-Algorithms.svg?color=%09%2346c018)](https://github.com/addy1997/RL-Algorithms/graphs/commit-activity) [![Lines Of Code](https://tokei.rs/b1/github/addy1997/RL-Algorithms?category=code)](https://github.com/addy1997/RL-Algorithms)   [![Total alerts](https://img.shields.io/lgtm/alerts/g/addy1997/RL-Algorithms.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/addy1997/RL-Algorithms/alerts/) [![Code](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/ambv/black)  [![CodeFactor](https://www.codefactor.io/repository/github/addy1997/RL-Algorithms/badge)](https://www.codefactor.io/repository/github/addy1997/RL-Algorithms)

#### Table of Contents

* [Epsilon](#Epsilon)

* [SARSA](#SARSA)

* [Q_learning](#Q_learning)

* [Expected-SARSA](#Expected-SARSA)

* [Double-SARSA](#Double-SARSA)

* [Expected-Double-SARSA](#Expected-Double-SARSA)

* [References](#References)

## [Epsilon](#RL-Algorithms)

## [SARSA](#RL-Algorithms)

# Algorithm 

![logo](https://github.com/addy1997/RL-Algorithms/blob/master/assets/SARSA_psuedo.png)

# Theory

**SARSA** or _State-Action-Reward-State-Action_ is an algorithm based on **on-policy** **TD(0)** control method in reinforcement learning.

It follows **Generalised Policy Iteration** strategy: as the policy **π** becomes greedy with respect to the state-action value function, the state-action value function becomes more optimal. Our aim is to estimate **Qπ(s, a)** for the current policy π and all state-action (s-a) pairs.

* We learn the **state-action value** function **Q(s,a)** rather than **state-value** **V(s)**.

* Here, **qπ(s,a)** is the estimate for the current **behavior policy π** for all the state-actions pairs (s,a).

* Initialising a suitable state **s** (s should not be a terminal state).

* Choose an appropriate action **A** under the policy **epsilon-greedy or epsilon-soft**.

* Record the values of the **state S'** and the **reward R**.

* Update the function -> _Q(S, A) ← Q(S, A) + αR + γQ(S′, A′) − Q(S, A)_

* This loop runs till it encounters a terminal state where **Q(s',a')** = 0.

# SARSA update rule

![logo](https://github.com/addy1997/RL-Algorithms/blob/master/assets/sarsa2.png)

## [Q_learning](#RL-Algorithms)

**Q-learning** similar to **SARSA**, is based on **off-policy TD(0)** control method. Both the algorithms aim to estimate the **Qπ(s, a)** value for all the **state-action** pairs invlved in the task. 

# Q-learning Algorithm 

![logo](https://github.com/addy1997/RL-Algorithms/blob/master/assets/Q_learning2.png)

# Q-leaning vs SARSA

The only difference is that in **SARSA** the action **a'** to go from **current state** to the **next state** is selected by the same policy **π** (behavioral policy). Whereas in **Q-learning**, the action **a'** to go from **present state** to **next state** is selected in **greedy** manner, i.e., there are fewer chances of choosing a random action in a state. Hence, it involves more explotaiton than exploration. 

# Q-learning update rule

![logo](https://github.com/addy1997/RL-Algorithms/blob/master/assets/Q_learning1.png)

## [Expected-SARSA](#RL-Algorithms)

# Algorithm

![algorithm](https://github.com/addy1997/RL-Algorithms/blob/master/assets/expected-sarsa.png)

## [Double-SARSA](#RL-Algorithms)

## [Expected-Double-SARSA](#RL-Algorithms)

## [References](#RL-Algorithms)

* [Vanilla DQN, Double DQN, and Dueling DQN in PyTorch](https://github.com/dxyang/DQN_pytorch)

* [Deep Reinforcement Learning for Keras](https://github.com/keras-rl/keras-rl)

* [Deep Reinforcement Learning with pytorch & visdom](https://github.com/jingweiz/pytorch-rl)

* [Minimal PyTorch DQN](https://github.com/econti/minimal_dqn)

* [deep-Q-networks: Implementation of algorithms from the Q-learning family](https://github.com/cyoon1729/deep-Q-networks)

* [keras-rl2 ](https://github.com/wau/keras-rl2)

* [RL-Adventure](https://github.com/higgsfield/RL-Adventure)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/addy1997/rl-algorithms

Awesome Lists containing this project

README