https://github.com/charmve/puppygo
vision language model and large language model powered embodied robot
https://github.com/charmve/puppygo
chatgpt deepmind embodied-agent palm2 rt-2 voxposer
Last synced: 4 months ago
JSON representation
vision language model and large language model powered embodied robot
- Host: GitHub
- URL: https://github.com/charmve/puppygo
- Owner: Charmve
- Created: 2023-09-05T02:05:57.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-09-18T03:45:23.000Z (over 1 year ago)
- Last Synced: 2024-04-15T09:05:25.589Z (about 1 year ago)
- Topics: chatgpt, deepmind, embodied-agent, palm2, rt-2, voxposer
- Homepage:
- Size: 11.7 MB
- Stars: 5
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# PuppyGo
Vision language model and large language model powered embodied agent.
![]()
![]()
## Hereβs what I did:
- Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents
- extracts affordances and constraints from large language models and vision-language models to compose 3D value maps, which are used by motion planners to zero-shot synthesize trajectories for everyday manipulation tasks.
- combine with e2e large model trainning framework, like UniAD;## This Package Is Sponsorware π°π°π°
[](https://github.com/sponsors/Charmve?frequency=one-time&sponsor=Charmve) https://github.com/sponsors/Charmve?frequency=one-time&sponsor=Charmve
This repo was only available to my sponsors on GitHub Sponsors until I reached 15 sponsors.
Learn more about **Sponsorware** at [github.com/sponsorware/docs](https://github.com/sponsorware/docs) π°.

## Execution under Disturbances
Because the language model output stays the same throughout the task, we can cache its output and re-evaluate the generated code using closed-loop visual feedback, which enables fast replanning using MPC. This enables VoxPoser to be robust to online disturbances.
"Sort the paper trash into the blue tray."
"Close the top drawer."