An open API service indexing awesome lists of open source software.

https://github.com/gregwar/mjlab_cartpole

Cartpole minimal example using mjlab as external dependency
https://github.com/gregwar/mjlab_cartpole

Last synced: 4 months ago
JSON representation

Cartpole minimal example using mjlab as external dependency

Awesome Lists containing this project

README

          

# Mjlab Cartpole example



This is an example of a simple CartPole environment using [mjlab](https://github.com/mujocolab/mjlab/) for training. This repository is both a pedagogical example using mjlab and an example of organization using mjlab as an external dependency.

## Installing

Simply run:

```
uv sync
```

## Running the training

To run the training, use:

```
uv run train Mjlab-Cartpole
```

## Playing the environment

Run:

```
uv run play Mjlab-Cartpole --checkpoint-file [path-to-checkpoint]
```

The checkpoint will typically appear in `logs/rsl_rl/exp1/[date]/model_499.pt`

## Organization

The structure is as follow:

* `src/mjlab_cartpole/`
* `robot/`: Robot model
* `xmls/cartpole.xml`: MuJoCo MJCF spec
* `cartpole_constants.py`: Describing how to load the robot entity (mostly loading the XML spec here)
* `tasks/`: Task definition
* `__init__.py`: Environments are registered here
* `cartpole_env_cfg.py`: Environment configurations (actions, rewards, termination, reset etc.)

Here, mjlab is used as an external dependency.

## Environment

### Action

Actions is the force $f$ applied to the cart (Newton).

## Observation

Agent observation is the pole angle and velocity and the cart position and velocity.

### Reward

The goal is for the pole not to fall, while navigating the cart to the center, the reward is as follows:

$$
r = 5 \times cos(\theta) + exp(-\frac{x^2}{\sigma^2}) - 10^{-2} ({\frac{f}{20}})^2
$$

Where:

* $\theta$ is the pole angle (0 is upright)
* $x$ is the cart position
* $\sigma = 0.3$ is a deviation to the center
* $f$ is the applied cart force (command)

### Termination & truncation

A termination is issued when $|\theta| > 30 \space deg$ and a timeout happens after 10s.

## References

* This is inspired by the [Creating a New Task](https://github.com/mujocolab/mjlab/blob/main/docs/create_new_task.md) markdown from mjlab