Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/pockerman/py_cube_ai

Reinforcement learning algorithms with Python
https://github.com/pockerman/py_cube_ai

openai-gym python pytorch reinforcement-learning reinforcement-learning-algorithms ros2

Last synced: 3 days ago
JSON representation

Reinforcement learning algorithms with Python

Awesome Lists containing this project

README

        

## PyCubeAI

[![Documentation Status](https://readthedocs.org/projects/pockerman-py-cubeai/badge/?version=latest)](https://pockerman-py-cubeai.readthedocs.io/en/latest/?badge=latest) [![Python application](https://github.com/pockerman/rl_python/actions/workflows/python-app.yml/badge.svg?branch=master)](https://github.com/pockerman/rl_python/actions/workflows/python-app.yml)

PyCubeAI is an effort to create an environment for design, devlopment simulation and deployment of reinforcement learning algorithms
that target robotic platforms.

The project documentation can be found at CubeAI documentaion.
The C++ flavor of the project can be found at CubeAI.

Implementation of reinforcement learning algorithms. Algorithms have been refactored/reimplemented
from various resources such as:

- Udacity DRL repository
- Reinforcement learning in motion
- Deep Reinforcement Learning in Action

## Dependencies

- OpenAI Gym
- PyTorch
- NumPy
- Webots

## Installation
TODO
### Installing webots and getting started
Checkout the instructions here how to install and get started with Webots.

## Documentation
TODO

## Examples

### Reinforcement learning basic algorithms

- Dummy agent on ```MountainCar-v0```
- Armed-bandit with epsilon greedy policy
- Armed-bandit with softmax policy
- Contextual bandits

#### Dynamic programming

- Iterative policy evaluation on ```FrozenLake-v0```
- Policy improvement on ```FrozenLake-v0```
- Policy iteration on ```FrozenLake-v0```
- Value iteration on ```FrozenLake-v0```

#### Monte Carlo

- Monte Carlo prediction on ```Blackjack-v0```
- Approximate Monte Carlo on ```MountainCar-v0```
- Monte Carlo tree search ```Taxi-v3```

#### Temporal differencing

- TD(0) on ```CartPole-v0```
- SARSA on ```Cliffwalking-v0```
- SARSA on ```CartPole-v0```
- Q-learning on ```Cliffwalking-v0```
- Q-learning on ```CartPole-v0```
- Expected SARSA (TODO)
- SARSA lambda (TODO)
- TD(0) semi-gradient on ```MountainCar-v0```
- SARSA semi-gradient on ```MountainCar-v0```
- Q-learning on ```MountainCar-v0```
- Double Q-learning on ```CartPole-v0```

#### DQN

- Vanilla DQN on ```Gridworld```
- DQN with experience replay on ```Gridworld```
- DQN with target network on ```Gridworld```
- Vanilla DQN on ```CartPole-v0```
- Vanilla DQN on ```LunarLander-v2```

#### Approximate methods

- Simple gradient descent solver
- REINFORCE on ```CartPole-v0```
- A2C on ```CartPole-v1```

## Robotics simulations

- Q-learning with epuck robot.

## References

- ```Deep Reinforcement Learning in Action```
- tinyML Talks: Deploying AI to Embedded Systems
- tinyML Talks: Exploring techniques to build efficient and robust TinyML deployments