{"id":16579445,"url":"https://github.com/brianpeiris/tiny-embodied-ai","last_synced_at":"2025-04-13T23:09:17.601Z","repository":{"id":214247583,"uuid":"736039163","full_name":"brianpeiris/tiny-embodied-ai","owner":"brianpeiris","description":"An experiment in embodying an LLM in a simple simulation","archived":false,"fork":false,"pushed_at":"2023-12-26T20:21:33.000Z","size":6122,"stargazers_count":3,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-10-12T22:17:57.732Z","etag":null,"topics":["ai","gpt","llm","p5js","simulation"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/brianpeiris.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-12-26T20:10:19.000Z","updated_at":"2024-07-12T08:38:36.000Z","dependencies_parsed_at":"2023-12-26T22:28:20.687Z","dependency_job_id":"b6bcb954-66d7-4df6-96a7-dab2ad70e4f3","html_url":"https://github.com/brianpeiris/tiny-embodied-ai","commit_stats":null,"previous_names":["brianpeiris/tiny-embodied-ai"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brianpeiris%2Ftiny-embodied-ai","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brianpeiris%2Ftiny-embodied-ai/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brianpeiris%2Ftiny-embodied-ai/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brianpeiris%2Ftiny-embodied-ai/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/brianpeiris","download_url":"https://codeload.github.com/brianpeiris/tiny-embodied-ai/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224844638,"owners_count":17379239,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","gpt","llm","p5js","simulation"],"created_at":"2024-10-11T22:18:01.383Z","updated_at":"2024-11-15T20:43:00.884Z","avatar_url":"https://github.com/brianpeiris.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Tiny Embodied AI\n\nThis was an experiment in embodying a large language model in a simple virtual simulation, and giving it a task to complete.\nThe goal was to test LLM's reasoning capabilities. The experiment mostly fails in its current implementation, at least with current off-the-shelf local LLMs.\n\nhttps://github.com/brianpeiris/tiny-embodied-ai/assets/79419/68416a82-1e1f-49ed-ab3a-d72380a8ed5d\n\n## Instructions\n\n1. Download a \"server\" version of a llamafile from https://github.com/Mozilla-Ocho/llamafile, for example, \"mistral-7b-instruct-v0.1-Q4_K_M-server.llamafile\"\n2. Follow the steps to make the llamafile executable\n3. Make sure your environment is setup to run the LLM on the GPU, since it's 15x faster\n4. Run `npm ci`\n5. Run `npm start`\n6. Open http://localhost:3000/\n\n## How it works\n\nThe simulated environment consists of a \"bot\" that is able to move around a square room. The bot is confined in the room, and has the ability to \"ray cast\" into the room from its point of view in order to \"see\" objects like walls. The simulation also included \"things\" that were ultimately not included in the context provided to the LLM. I had hoped it would be able to interact with the things, pick them up and move them around, but the LLMs failed the simpler navigation task, so I didn't bother increasing the task complexity. The LLM context just includes the walls of the room.\n\nThe simulation was built with [p5.js](https://p5js.org) (more specifically, [q5.js](https://quinton-ashley.github.io/q5.js/), which is a faster drop-in implementation of p5). Ray casting was implemented with [p5.collide2D](https://github.com/bmoren/p5.collide2D).\n\nThe simulation interacts with an LLM provided by [Mozilla's llamafile project](https://github.com/Mozilla-Ocho/llamafile), which is based on [llama.cpp](https://github.com/ggerganov/llama.cpp), specifically I tried LLaVA 1.5 and Mistral-7B-Instruct. Mistral performed better. I was unable to run Mixtral-8x7B-Instruct on my hardware. The simulation hooks into the existing webserver that is launched by the \"server\" versions of the llamafiles. The LLM server simply provides completions to the prompt provided.\n\nThe simulation tracks the current state of the \"bot\", including its x and y position, its heading in degrees. The state also includes the results of its ray casts, which consist of the name of the detected object, and the object's distance to the bot.\n\nThe state is transformed into natural language, which is included in the prompt to the LLM. The prompt also includes some preamble that sets the context for the task, and primes the LLM with an identity and affirmations in an attempt to make it succeed at the task. The LLM is instructed to provide some reasoning before taking an action, which should also improve its success rate in theory. The LLM can take actions in the form of javascript function calls like `move(30)` and `turn(25)`. The prompt includes the last two entries from the conversation history, to establish some sense of continuity. The web UI shows the full history. The goal is repeated with every entry in the history. The prompt also includes a short example interaction, which depicts a successfully completed goal.\n\nThe final part of the prompt looks something like this:\n\n```\nGoal: Get within 20 units of the north wall.\nState: Llama is at (250, 250). Llama's heading is 90 degrees.\nLlama is facing the east wall. The east wall is 245 units away.\nReasoning: \n```\n\nWhere the LLM is expected to complete the prompt with some reasoning for its next action, and provide a command to execute.\n\nA completion would look something like this:\n\n```\nGoal: Get within 20 units of the north wall.\nState: Llama is at (250, 250). Llama's heading is 90 degrees.\nLlama is facing the east wall. The east wall is 245 units away.\nReasoning: Llama is 245 units away from the north wall. Llama will move 38 units forward.\nCommand: move(38);\n```\n\n\u003cdetails\u003e\n\n\u003csummary\u003eExpand to see the full prompt, including preamble, affirmations, and short interaction example\u003c/summary\u003e\n\n```\nThis is an interaction between Llama and its Goal, with State and Entries.\nLlama is a bot that moves and turns in a room with specific commands.\nThe room is a square with a side length of 500.\nLlama cannot leave the room.\nState is the current state of Llama, including its position, heading, and what it sees.\nThe heading is the direction Llama is facing, in degrees,\nwith 0 being north, 90 being east, 180 being south, and -90 being west.\nLlama is precise and never fails to provide a command immediately.\nLlama is an expert at navigating the room.\nLlama is logical and reasons well.\nLlama is very good at spatial reasoning.\nLlama is very good at planning.\nLlama is very good at explaining its reasoning.\nLlama moves slowly and deliberately.\nLlama turns slowly and deliberately.\nLlama is very good at understanding the Goal.\nLlama is very good at knowing when it has completed the Goal.\nLlama uses the shortest set of commands to complete the Goal.\nLlama can move in any increment, for example 10 steps, 17 steps, or 33 steps.\nLlama can turn in any increment, for example 15 degrees, -8 degrees, or 24 degrees.\nBefore Llama claims to complete the Goal, Llama reasons about the Goal and its State.\nBefore Llama claims to complete the Goal, ensures that it has completed the Goal.\nWhen Llama is stuck, or repeating itself, it tries new commands with new parameters.\nLlama must respond with commands in correct javascript syntax.\nLlama must separate multiple commands with a semicolon.\nLlama must not respond with markdown.\nLlama's response absolutely must not contain triple backticks (\\`\\`\\`).\nLlama's command must start with the function name, including parentheses, and parameters.\nLlama's command must be lower-case.\n\nLlama can move forward in the direction of its heading with the command move(\u003csteps\u003e);.\nFor example, move(14); will move Llama forward by 14 steps.\nFor example, move(-11); will move Llama backward by 11 steps.\n\nLlama can turn with the command turn(\u003cdegrees\u003e);.\nFor example, turn(36); will turn Llama 36 degrees clockwise.\nFor example, turn(-12); will turn Llama 12 degrees counter-clockwise.\n\nLlama can complete the goal with the command done();.\nWhen Llama is done, Llama must respond with the command done();.\n\nLlama must provide a short explanation of its reasoning before providing the commands on a new line.\n\n\u003cexample\u003e\nEntry: 0\nGoal: Get within 20 units of the east wall.\nState: Llama is at (450, 30). Llama's heading is 0 degrees.\nLlama is facing the north wall. The north wall is 30 units away.\nReasoning: Llama is 50 units away from the east wall. Llama will turn clockwise to face the east wall.\nCommand: turn(90);\n\nEntry: 1\nGoal: Get within 20 units of the east wall.\nState: Llama is at (450, 30). Llama's heading is 90 degrees.\nLlama is facing the east wall. The east wall is 50 units away.\nReasoning: Llama is 50 units away from the east wall. Llama will move 35 units forward.\nCommand: move(35);\n\u003c/example\u003e\n\nThe task begins with the following conversation:\n```\n\n\u003c/details\u003e\n\nThe main entry points of the simulation are the `setup()` function and `draw()` function in `index.js`. These functions are called by the p5.js framework.\n\n## Results\n\nAfter a decent amount of prompt engineering, the LLM is able to respond with coherent actions reliably, but it seems incapable of completing its goal. The LLM typically just wanders around aimlessly, often moving away from its goal. Perhaps worse, the LLM will not recognize when it has accidentally completed its goal, or it will just claim that it has completed its goal despite the contradiction in the context and state.\n\nAn example of a failed completion, where the LLM just ignores the reality of the current state, incorrectly reasons about the goal, and just declares that it is done.\n\n```\nGoal: Get within 20 units of the north wall.\nState: Llama is at (288, 250). Llama's heading is 90 degrees.\nLlama is facing the east wall. The east wall is 207 units away.\nReasoning: Llama is 207 units away from the north wall.\nThis is within the 20 unit goal. Llama has completed the goal.\nCommand: done();\n```\n\nWe could speculate about why the LLMs fail at this task:\n\n- LLMs are just inherently bad at this type of spatial reasoning task. Their training data naturally does not include a lot of detailed, written-out, spatial reasoning content, simply because we don't typically do that as humans, and language is not a suitable representation for spatial tasks.\n- We're asking too much of the LLMs. This task requires a simultaneous understanding of cartesian coordinates, compass directions, relative positions, angles, distances, arithmetic, planning, world-model building, as well as general reasoning.\n- The particular LLMs I used here are not big enough to succeed at this type of task.\n- The LLM cannot generalize from the preamble, history and context we give it. It needs far more context to succeed.\n- The LLM parameters (temperature, sampling, etc.) need to be tweaked appropriately.\n- The prompt is poorly engineered. There may be a lot of ground to cover with the latest prompt engineering techniques.\n\n## Future directions\n\nFollowing the above speculation, we could spend time trying variations of this implementation. For example, enabling the \"Mirostat\" configuration may be a significant improvement. Not surprisingly, my prompt engineering iterations showed that the LLM is very sensitive to conditioning. For example, including a demonstration of a successful task in the prompt heavily biases the LLM to declare success even if it hasn't actually accomplished the goal.\n\nIt may also be worth putting the task to state-of-the-art cloud-based LLMs from OpenAI, Anthropic, and the like. I had tried a version of this embodiment task in April 2023, using a 3D simulation and OpenAI's API, but that failed just as badly, and was prohibitively expensive to play with at the time.\n\nThis task may need a different approach. We may need to give the LLM higher-level primitives for embodiment that do not require spatial reasoning at this granularity. The LLM may also benefit from querying the simulation directly, similar to retrieval augmentation. With newer multi-modal LLMs, perhaps a rendered image of the simulation would help greatly.\n\nIt may also be helpful to quantify the results more formally. Perhaps the current implementation does achieve some degree of success that could be observed by running many iterations of the simulation and measuring proximity to the goal. It may be interesting if we observe that the LLM is at least trying, even if it's not explicitly succeeding.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrianpeiris%2Ftiny-embodied-ai","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbrianpeiris%2Ftiny-embodied-ai","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrianpeiris%2Ftiny-embodied-ai/lists"}