Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/studiolacosanostra/ml-double-q-learning

Library implementing the double-q-learning algorithm.
https://github.com/studiolacosanostra/ml-double-q-learning

q-learning reinforcement-learning typescript

Last synced: 25 days ago
JSON representation

Library implementing the double-q-learning algorithm.

Host: GitHub
URL: https://github.com/studiolacosanostra/ml-double-q-learning
Owner: studioLaCosaNostra
Created: 2019-03-12T17:25:11.000Z (almost 6 years ago)
Default Branch: master
Last Pushed: 2019-03-13T19:02:12.000Z (almost 6 years ago)
Last Synced: 2024-12-07T12:06:57.586Z (25 days ago)
Topics: q-learning, reinforcement-learning, typescript
Language: TypeScript
Size: 162 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # ml-double-q-learning

Library implementing the double-q-learning algorithm.

paper: https://papers.nips.cc/paper/3964-double-q-learning.pdf

## Install

`npm install ml-double-q-learning`

## DoubleQLearningAgent

```typescript

export class DoubleQLearningAgent implements IQLearningAgent {

  public replayMemory: [string, number, number][] = [];

  public episode: number = 0;

  public trained = false;

  constructor(

    public actions: TAction[],

    private pickActionStrategy: (actionsStats: number[], episode: number) => Promise = greedyPickAction,

    public memory: IMemoryAdapter = new MapInMemory(),

    public learningRate = 0.1,

    public discountFactor = 0.99,

  ) {}

  public async play(state: IState): Promise> {};

  public reward(step: IStep, reward: number): void {};

  public async learn(): Promise {};

}

```

## Memory (from ml-q-learning)

- [`MapInMemory`](https://github.com/studioLaCosaNostra/ml-q-learning/blob/master/src/memory/map-in-memory.ts#L4)

- [`IndexedDBMemory`](https://github.com/studioLaCosaNostra/ml-q-learning/blob/master/src/memory/indexeddb-memory.ts#L23)

## Pick action strategy (from ml-q-learning)

- [`randomPickAction`](https://github.com/studioLaCosaNostra/ml-q-learning/blob/master/src/pick-action-strategy/index.ts#L13)

- [`greedyPickAction`](https://github.com/studioLaCosaNostra/ml-q-learning/blob/master/src/pick-action-strategy/index.ts#L17)

- [`epsilonGreedyPickAction`](https://github.com/studioLaCosaNostra/ml-q-learning/blob/master/src/pick-action-strategy/index.ts#L22)

- [`decayingEpsilonGreedyPickAction`](https://github.com/studioLaCosaNostra/ml-q-learning/blob/master/src/pick-action-strategy/index.ts#L32)

- [`softmaxPickAction`](https://github.com/studioLaCosaNostra/ml-q-learning/blob/master/src/pick-action-strategy/index.ts#L39)

- [`epsilonSoftmaxGreedyPickAction`](https://github.com/studioLaCosaNostra/ml-q-learning/blob/master/src/pick-action-strategy/index.ts#L51)

- [`decayingEpsilonSoftmaxGreedyPickAction`](https://github.com/studioLaCosaNostra/ml-q-learning/blob/master/src/pick-action-strategy/index.ts#L61)

## Example use

`Maze escape`

[src/example/maze-escape.ts](https://github.com/studioLaCosaNostra/ml-double-q-learning/blob/master/src/example/maze-escape.ts)

```

P - Player

# - Wall

. - Nothing

X - Trap = -200

R - Treasure = 200

F - Finish = 1000

```

```bash

Start maze

[ [ 'P', '.', '.', '#', '.', '.', '.', '#', 'R' ],

  [ '.', '#', '.', '#', '.', '.', '.', '#', '.' ],

  [ '.', '#', '.', '#', '.', '#', '.', '#', '.' ],

  [ '.', '#', 'X', '#', '.', '#', '.', '.', '.' ],

  [ '.', '#', '#', '#', 'F', '#', '.', '.', '.' ],

  [ '.', '#', '.', '#', '#', '#', '.', '#', 'X' ],

  [ '.', '.', 'X', '.', '.', '.', '.', '#', '.' ],

  [ '.', '.', '.', '.', '#', '.', '.', '#', 'R' ] ]

...many plays...

-------------------------------

  numberOfPlay: 35702,

  score: 1168

  episode: 3322672

  memorySize: 968

-------------------------------

[ [ '.', '.', '.', '#', '.', '.', '.', '#', '.' ],

  [ '.', '#', '.', '#', '.', '.', '.', '#', '.' ],

  [ '.', '#', '.', '#', '.', '#', '.', '#', '.' ],

  [ '.', '#', 'X', '#', '.', '#', '.', '.', '.' ],

  [ '.', '#', '#', '#', 'P', '#', '.', '.', '.' ],

  [ '.', '#', '.', '#', '#', '#', '.', '#', 'X' ],

  [ '.', '.', 'X', '.', '.', '.', '.', '#', '.' ],

  [ '.', '.', '.', '.', '#', '.', '.', '#', 'R' ] ]

```

## Sources

- https://papers.nips.cc/paper/3964-double-q-learning.pdf

- https://towardsdatascience.com/double-q-learning-the-easy-way-a924c4085ec3