https://github.com/krproject-tech/rl_tripendulum_swingup
Reinforced learning for the swing-up problem of the tri-pendulum.
https://github.com/krproject-tech/rl_tripendulum_swingup
anaconda control mujoco python reinforcement-learning stablebaselines3
Last synced: 3 months ago
JSON representation
Reinforced learning for the swing-up problem of the tri-pendulum.
- Host: GitHub
- URL: https://github.com/krproject-tech/rl_tripendulum_swingup
- Owner: KRproject-tech
- License: mit
- Created: 2025-05-09T07:07:42.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-05-11T03:24:09.000Z (5 months ago)
- Last Synced: 2025-05-30T13:48:00.911Z (4 months ago)
- Topics: anaconda, control, mujoco, python, reinforcement-learning, stablebaselines3
- Language: Python
- Homepage:
- Size: 2.71 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# RL_triPendulum_Swingup

**Communication**
Reinforced learning for the swing-up problem of the tri-pendulum.
SAC (Soft-Actor Critic)[^1] is employed as the RL algorithm ([Stable Baselines3](https://stable-baselines3.readthedocs.io/en/master/) [^2]).
Reinforcement learning (RL) is conducted on the [MuJoCo environment](https://mujoco.readthedocs.io/en/stable/overview.html).
## Preparation before analysis
Show instructions
Requirement:
* __python=3.11__
* For Reinforcement learning algorithm: __Stable Baselines3=2.3.2__
* For simulation environment: __gymnasium=0.29.1__
* For plot: __matplotlib=3.9.2__
* For matrix calculation: __numpy=1.26.0__
* For the Matlab file (*.mat) generation: __scipy=1.13.1__
* For simulation environment: __mujoco=3.2.4__
* For data logging: __tensorboard=2.18.0__
* For GPGPU: __PyTorch__
* others
```batch
conda create -n py311mujoco python=3.11
conda activate py311mujoco
conda install numpy=1.26.0# install PyTorch
pip install stable-baselines3
pip install gymnasium
pip install mujoco
pip install tensorboard# install others
git clone https://github.com/KRproject-tech/RL_triPendulum_Swingup
cd RL_triPendulum_Swingup
python exe.py# wait until 5*1e+6 steps
```
To confirm the learning process, the following command is used in the another terminal;```batch
cd RL_triPendulum_Swingup
tensorboard --logdir logs/SAC_1
```After that, we can confirm the learning process in the web browser at `http://localhost:6006/`.
The convergence of reinforcement learning depends on the initial value.
Therefore, __some trials of reinforcement learning may be necessary to achieve the inverted situation.__## Definitions for the dynamics of the pendulum
The dynamics of the pendulum are defined in the MJCF (MuJoCo Format) file named `inverted_tri_pendulum.xml`.
Show inverted_tri_pendulum.xml
````xml
````
## actions
Force to the moving base of the tri-pendulum along sliding direction from -2N to 2N, namely;
$$
a := f_x.
$$## observations
Observations are defined in `inverted_tri_pendulum_swingup.py`.
Observations of the rotational angle between links should be limited to achieve fast convergence of reinforcement learning.
Therefore, the following definition of observations is employed;````python
def _get_obs(self):
return np.concatenate(
[
self.data.qpos[:1], # cart x pos [m]
np.sin( self.data.qpos[1:]), # link angles [rad]
np.cos( self.data.qpos[1:]), # link angles [rad]
np.clip( self.data.qvel[:1], -10, 10), # cart x pos vel [m/s]
np.clip( self.data.qvel[1:], -10*np.pi, 10*np.pi), # link angles vel [rad/s]
]
).ravel()
````
Namely,$$
\bf{o} :=
\left[
\begin{array}{c}
x\\
\sin(\theta_1)\\
\sin(\theta_2)\\
\sin(\theta_3)\\
\cos(\theta_1)\\
\cos(\theta_2)\\
\cos(\theta_3)\\
\rm{clip}(v, -10, 10)\\
\rm{clip}(\omega_1, -10\pi, 10\pi)\\
\rm{clip}(\omega_2, -10\pi, 10\pi)\\
\rm{clip}(\omega_3, -10\pi, 10\pi)\\
\end{array}
\right]
$$## Rewards[^3]
Rewards are defined in `inverted_tri_pendulum_swingup.py`.
````python
J_xpos = np.exp( -(x_pos/2.0)**2 )# theta1 = 0 -> below position
J_theta1 = ( 1 + np.cos( theta1 - PI ) )/2.0
J_theta2 = ( 1 + np.cos( theta2 ) )/2.0
J_theta3 = ( 1 + np.cos( theta3 ) )/2.0J_omega1 = np.exp( -5.0*( omega1/(2*PI) )**2)
J_omega2 = np.exp( -1.0*( omega2/(2*PI) )**2)
J_omega3 = np.exp( -1.0*( omega3/(2*PI) )**2)r = J_xpos*J_theta1*J_theta2*J_theta3*np.amin([ J_omega1, J_omega2, J_omega3 ])
````Namely,
$$
r := J_x \cdot J_{\theta_1} \cdot J_{\theta_2} \cdot J_{\theta_3} \min \\{ J_{\omega_1}, J_{\omega_2}, J_{\omega_3} \\},
$$where,
```math
\begin{eqnarray}
J_x &:=& \exp \left( -\left(\frac{x}{2}\right)^2 \right), \\
J_{\theta_1} &:=& \frac{1 + \cos(\theta_1 - \pi)}{2},\\
J_{\theta_2} &:=& \frac{1 + \cos(\theta_2)}{2},\\
J_{\theta_3} &:=& \frac{1 + \cos(\theta_3)}{2},\\
J_{\omega_1} &:=& \exp \left( -5\left(\frac{\omega_1}{2\pi}\right)^2 \right),\\
J_{\omega_2} &:=& \exp \left( -\left(\frac{\omega_2}{2\pi}\right)^2 \right),\\
J_{\omega_3} &:=& \exp \left( -\left(\frac{\omega_3}{2\pi}\right)^2 \right).
\end{eqnarray}
```## Garallys
### References
[^1]: T. Haarnoja, A. Zhou, P. Abbeel, S. Levine, Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor, ArXiv abs/1801.01290 (2018).
[^2]: A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, N. Dormann, Stable-baselines3: Reliable reinforcement learning implementations, Journal of Machine Learning Research 22 (268) (2021) 1–8.
[^3]: Jongchan Baek, Changhyeon Lee, Young Sam Lee,∗, Soo Jeon, Soohee Han, Reinforcement learning to achieve real-time control of triple inverted pendulum, Engineering Applications of Artificial Intelligence, Vol. 128, 2024.