Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lucidrains/anymal-belief-state-encoder-decoder-pytorch
Implementation of the Belief State Encoder / Decoder in the new breakthrough robotics paper from ETH Zürich
https://github.com/lucidrains/anymal-belief-state-encoder-decoder-pytorch
artificial-intelligence deep-learning locomotion-control robotics
Last synced: about 2 months ago
JSON representation
Implementation of the Belief State Encoder / Decoder in the new breakthrough robotics paper from ETH Zürich
- Host: GitHub
- URL: https://github.com/lucidrains/anymal-belief-state-encoder-decoder-pytorch
- Owner: lucidrains
- License: mit
- Created: 2022-01-26T16:39:20.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2022-05-06T16:48:07.000Z (over 2 years ago)
- Last Synced: 2024-10-15T00:16:51.059Z (2 months ago)
- Topics: artificial-intelligence, deep-learning, locomotion-control, robotics
- Language: Python
- Homepage:
- Size: 1.08 MB
- Stars: 62
- Watchers: 6
- Forks: 9
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## Belief State Encoder / Decoder (Anymal) - Pytorch
Implementation of the Belief State Encoder / Decoder in the new breakthrough robotics paper from ETH Zürich.
This paper is important as it seems their learned approach produced a policy that rivals Boston Dynamic's handcrafted algorithms (quadripedal Spot).
The results speak for itself in their video demonstration
## Install
```bash
$ pip install anymal-belief-state-encoder-decoder-pytorch
```## Usage
Teacher
```python
import torch
from anymal_belief_state_encoder_decoder_pytorch import Teacherteacher = Teacher(
num_actions = 10,
num_legs = 4,
extero_dim = 52,
proprio_dim = 133,
privileged_dim = 50
)proprio = torch.randn(1, 133)
extero = torch.randn(1, 4, 52)
privileged = torch.randn(1, 50)action_logits, values = teacher(proprio, extero, privileged, return_values = True) # (1, 10)
```Student
```python
import torch
from anymal_belief_state_encoder_decoder_pytorch import Studentstudent = Student(
num_actions = 10,
num_legs = 4,
extero_dim = 52,
proprio_dim = 133,
gru_num_layers = 2,
gru_hidden_size = 50
)proprio = torch.randn(1, 133)
extero = torch.randn(1, 4, 52)action_logits, hiddens = student(proprio, extero) # (1, 10), (2, 1, 50)
action_logits, hiddens = student(proprio, extero, hiddens) # (1, 10), (2, 1, 50)
action_logits, hiddens = student(proprio, extero, hiddens) # (1, 10), (2, 1, 50)# hiddens are in the shape (num gru layers, batch size, gru hidden dimension)
# train with truncated bptt
```Full Anymal (which contains both Teacher and Student)
```python
import torch
from anymal_belief_state_encoder_decoder_pytorch import Anymalanymal = Anymal(
num_actions = 10,
num_legs = 4,
extero_dim = 52,
proprio_dim = 133,
privileged_dim = 50,
recon_loss_weight = 0.5
)# mock data
proprio = torch.randn(1, 133)
extero = torch.randn(1, 4, 52)
privileged = torch.randn(1, 50)# first train teacher
teacher_action_logits = anymal.forward_teacher(proprio, extero, privileged)
# teacher is trained with privileged information in simulation with domain randomization
# after teacher has satisfactory performance, init the student with the teacher weights, excluding the privilege information encoder from the teacher (which student does not have)
anymal.init_student_with_teacher()
# then train the student on the proprioception and noised exteroception, forcing it to reconstruct the privileged information that the teacher had access to (as well as learning to denoise the exterception) - there is also a behavior loss between the policy logits of the teacher with those of the student
loss, hiddens = anymal(proprio, extero, privileged)
loss.backward()# finally, you can deploy the student to the real world, zero-shot
anymal.eval()
dist, hiddens = anymal.forward_student(proprio, extero, return_action_categorical_dist = True)
action = dist.sample()
```PPO training of the Teacher (using a mock environment, this needs to be substituted with a environment wrapper around simulator)
```python
import torch
from anymal_belief_state_encoder_decoder_pytorch import Anymal, PPO
from anymal_belief_state_encoder_decoder_pytorch.ppo import MockEnvanymal = Anymal(
num_actions = 10,
num_legs = 4,
extero_dim = 52,
proprio_dim = 133,
privileged_dim = 50,
recon_loss_weight = 0.5
)mock_env = MockEnv(
proprio_dim = 133,
extero_dim = 52,
privileged_dim = 50
)ppo = PPO(
env = mock_env,
anymal = anymal,
epochs = 10,
lr = 3e-4,
eps_clip = 0.2,
beta_s = 0.01,
value_clip = 0.4,
max_timesteps = 10000,
update_timesteps = 5000,
)# train for 10 episodes
for _ in range(10):
ppo()# save the weights of the teacher for student training
torch.save(anymal.state_dict(), './anymal-with-trained-teacher.pt')
```To train the student
```python
import torch
from anymal_belief_state_encoder_decoder_pytorch import Anymal
from anymal_belief_state_encoder_decoder_pytorch.trainer import StudentTrainer
from anymal_belief_state_encoder_decoder_pytorch.ppo import MockEnvanymal = Anymal(
num_actions = 10,
num_legs = 4,
extero_dim = 52,
proprio_dim = 133,
privileged_dim = 50,
recon_loss_weight = 0.5
)# first init student with teacher weights, at the very beginning
# if not resuming trainingmock_env = MockEnv(
proprio_dim = 133,
extero_dim = 52,
privileged_dim = 50
)trainer = StudentTrainer(
anymal = anymal,
env = mock_env
)# for 100 episodes
for _ in range(100):
trainer()```
... You've beaten Boston Dynamics and its team of highly paid control engineers!
But you probably haven't beaten a real quadripedal "anymal" just yet :)
## Todo
- [x] finish belief state decoder
- [x] wrapper class that instantiates both teacher and student, handle student forward pass with reconstruction loss + behavioral loss
- [x] handle noising of exteroception for student
- [x] add basic PPO logic for teacher
- [x] add basic student training loop with mock environment
- [x] make sure all hyperparameters for teacher PPO training + teacher / student distillation is in accordance with appendix
- [ ] noise scheduler for student (curriculum factor that goes from 0 to 1 from epochs 1 to 100)
- [ ] fix student training, it does not look correct
- [ ] make sure tbptt is setup correctly
- [ ] add reward crafting as in paper
- [ ] play around with deepminds mujoco## Diagrams
## Citations
```bibtex
@article{2022,
title = {Learning robust perceptive locomotion for quadrupedal robots in the wild},
url = {http://dx.doi.org/10.1126/scirobotics.abk2822},
journal = {Science Robotics},
publisher = {American Association for the Advancement of Science (AAAS)},
author = {Miki, Takahiro and Lee, Joonho and Hwangbo, Jemin and Wellhausen, Lorenz and Koltun, Vladlen and Hutter, Marco},
year = {2022},
month = {Jan}
}
```