https://github.com/krproject-tech/rl_tripendulum_swingup

Reinforced learning for the swing-up problem of the tri-pendulum.
https://github.com/krproject-tech/rl_tripendulum_swingup

anaconda control mujoco python reinforcement-learning stablebaselines3

Last synced: 3 months ago
JSON representation

Reinforced learning for the swing-up problem of the tri-pendulum.

Host: GitHub
URL: https://github.com/krproject-tech/rl_tripendulum_swingup
Owner: KRproject-tech
License: mit
Created: 2025-05-09T07:07:42.000Z (5 months ago)
Default Branch: main
Last Pushed: 2025-05-11T03:24:09.000Z (5 months ago)
Last Synced: 2025-05-30T13:48:00.911Z (4 months ago)
Topics: anaconda, control, mujoco, python, reinforcement-learning, stablebaselines3
Language: Python
Homepage:
Size: 2.71 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # RL_triPendulum_Swingup

![License](https://img.shields.io/github/license/yuki-koyama/elasty)



**Communication**



    





Reinforced learning for the swing-up problem of the tri-pendulum.

SAC (Soft-Actor Critic)[^1] is employed as the RL algorithm ([Stable Baselines3](https://stable-baselines3.readthedocs.io/en/master/) [^2]).

Reinforcement learning (RL) is conducted on the [MuJoCo environment](https://mujoco.readthedocs.io/en/stable/overview.html). 

![rl-video-episode-10000](https://github.com/user-attachments/assets/33ba400b-c4fd-4b84-9645-57075a8a7324)

## Preparation before analysis

Show instructions

Requirement:

* __python=3.11__

* For Reinforcement learning algorithm: __Stable Baselines3=2.3.2__

* For simulation environment: __gymnasium=0.29.1__

* For plot: __matplotlib=3.9.2__

* For matrix calculation: __numpy=1.26.0__

* For the Matlab file (*.mat) generation: __scipy=1.13.1__

* For simulation environment: __mujoco=3.2.4__

* For data logging: __tensorboard=2.18.0__

* For GPGPU: __PyTorch__

* others

```batch

conda create -n py311mujoco python=3.11

conda activate py311mujoco

conda install numpy=1.26.0

# install PyTorch

pip install stable-baselines3

pip install gymnasium

pip install mujoco

pip install tensorboard

# install others

git clone https://github.com/KRproject-tech/RL_triPendulum_Swingup

cd RL_triPendulum_Swingup

python exe.py

# wait until 5*1e+6 steps

```

To confirm the learning process, the following command is used in the another terminal;

```batch

cd RL_triPendulum_Swingup

tensorboard --logdir logs/SAC_1

```

After that, we can confirm the learning process in the web browser at `http://localhost:6006/`.



The convergence of reinforcement learning depends on the initial value.

Therefore, __some trials of reinforcement learning may be necessary to achieve the inverted situation.__

## Definitions for the dynamics of the pendulum

The dynamics of the pendulum are defined in the MJCF (MuJoCo Format) file named `inverted_tri_pendulum.xml`.

Show inverted_tri_pendulum.xml

    

````xml

  

  

    

  

  

    

    

  

  

  

  

    

    

    

      

      

      

        

        

        

          

          

          

            

            

            

          

        

      

    

  

  

    

  

````

        

## actions

Force to the moving base of the tri-pendulum along sliding direction from -2N to 2N, namely;

$$

a := f_x.

$$

## observations

Observations are defined in `inverted_tri_pendulum_swingup.py`.

Observations of the rotational angle between links should be limited to achieve fast convergence of reinforcement learning.

Therefore, the following definition of observations is employed;

````python

def _get_obs(self):

    return np.concatenate(

        [

            self.data.qpos[:1],                                     # cart x pos [m]

            np.sin( self.data.qpos[1:]),                            # link angles [rad]   

            np.cos( self.data.qpos[1:]),                            # link angles [rad]    

            np.clip( self.data.qvel[:1], -10, 10),                  # cart x pos vel [m/s]  

            np.clip( self.data.qvel[1:], -10*np.pi, 10*np.pi),      # link angles vel [rad/s]      

        ]

    ).ravel()

````

Namely,

$$

\bf{o} := 

\left[

\begin{array}{c}

x\\

\sin(\theta_1)\\

\sin(\theta_2)\\

\sin(\theta_3)\\

\cos(\theta_1)\\

\cos(\theta_2)\\

\cos(\theta_3)\\

\rm{clip}(v, -10, 10)\\

\rm{clip}(\omega_1, -10\pi, 10\pi)\\

\rm{clip}(\omega_2, -10\pi, 10\pi)\\

\rm{clip}(\omega_3, -10\pi, 10\pi)\\

\end{array}

\right]

$$

## Rewards[^3]

Rewards are defined in `inverted_tri_pendulum_swingup.py`.

````python

J_xpos = np.exp( -(x_pos/2.0)**2 )

# theta1 = 0 -> below position

J_theta1 = ( 1 + np.cos( theta1 - PI ) )/2.0

J_theta2 = ( 1 + np.cos( theta2 ) )/2.0

J_theta3 = ( 1 + np.cos( theta3 ) )/2.0

J_omega1 = np.exp( -5.0*( omega1/(2*PI) )**2)

J_omega2 = np.exp( -1.0*( omega2/(2*PI) )**2)

J_omega3 = np.exp( -1.0*( omega3/(2*PI) )**2)

r =  J_xpos*J_theta1*J_theta2*J_theta3*np.amin([ J_omega1, J_omega2, J_omega3 ])

````

Namely, 

$$

r := J_x \cdot J_{\theta_1} \cdot J_{\theta_2} \cdot J_{\theta_3} \min \\{ J_{\omega_1}, J_{\omega_2}, J_{\omega_3} \\},

$$

where,

```math

\begin{eqnarray}

J_x &:=& \exp \left( -\left(\frac{x}{2}\right)^2 \right), \\

J_{\theta_1} &:=& \frac{1 + \cos(\theta_1 - \pi)}{2},\\

J_{\theta_2} &:=& \frac{1 + \cos(\theta_2)}{2},\\

J_{\theta_3} &:=& \frac{1 + \cos(\theta_3)}{2},\\

J_{\omega_1} &:=& \exp \left( -5\left(\frac{\omega_1}{2\pi}\right)^2 \right),\\

J_{\omega_2} &:=& \exp \left( -\left(\frac{\omega_2}{2\pi}\right)^2 \right),\\

J_{\omega_3} &:=& \exp \left( -\left(\frac{\omega_3}{2\pi}\right)^2 \right).

\end{eqnarray}

```

## Garallys





### References  

[^1]: T. Haarnoja, A. Zhou, P. Abbeel, S. Levine, Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor, ArXiv abs/1801.01290 (2018).

[^2]: A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, N. Dormann, Stable-baselines3: Reliable reinforcement learning implementations, Journal of Machine Learning Research 22 (268) (2021) 1–8.

[^3]: Jongchan Baek, Changhyeon Lee, Young Sam Lee,∗, Soo Jeon, Soohee Han, Reinforcement learning to achieve real-time control of triple inverted pendulum, Engineering Applications of Artificial Intelligence, Vol. 128, 2024.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/krproject-tech/rl_tripendulum_swingup

Awesome Lists containing this project

README