{"id":14964813,"url":"https://github.com/mgonzs13/llama_ros","last_synced_at":"2025-04-04T20:12:30.060Z","repository":{"id":150132144,"uuid":"622137360","full_name":"mgonzs13/llama_ros","owner":"mgonzs13","description":"llama.cpp (GGUF LLMs) and llava.cpp (GGUF VLMs) for ROS 2","archived":false,"fork":false,"pushed_at":"2024-10-29T11:43:12.000Z","size":2231,"stargazers_count":153,"open_issues_count":2,"forks_count":26,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-10-29T13:26:37.506Z","etag":null,"topics":["cpp","embeddings","ggml","gguf","gpt","langchain","llama","llamacpp","llava","llavacpp","llm","rerank","reranking","ros2","vlm"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mgonzs13.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-01T08:25:02.000Z","updated_at":"2024-10-29T11:43:16.000Z","dependencies_parsed_at":"2024-04-14T17:24:54.685Z","dependency_job_id":"cb04983f-5615-43bb-a07c-a53102ff80bf","html_url":"https://github.com/mgonzs13/llama_ros","commit_stats":{"total_commits":627,"total_committers":5,"mean_commits":125.4,"dds":0.00797448165869219,"last_synced_commit":"f256ef32c655a989b7c3c7eaf817db09d23794cb"},"previous_names":[],"tags_count":74,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mgonzs13%2Fllama_ros","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mgonzs13%2Fllama_ros/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mgonzs13%2Fllama_ros/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mgonzs13%2Fllama_ros/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mgonzs13","download_url":"https://codeload.github.com/mgonzs13/llama_ros/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247242680,"owners_count":20907134,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cpp","embeddings","ggml","gguf","gpt","langchain","llama","llamacpp","llava","llavacpp","llm","rerank","reranking","ros2","vlm"],"created_at":"2024-09-24T13:33:49.239Z","updated_at":"2025-04-04T20:12:30.028Z","avatar_url":"https://github.com/mgonzs13.png","language":"C++","funding_links":[],"categories":["Research-Grade Frameworks"],"sub_categories":[],"readme":"# llama_ros\n\nThis repository provides a set of ROS 2 packages to integrate [llama.cpp](https://github.com/ggerganov/llama.cpp) into ROS 2. Using the llama_ros packages, you can easily incorporate the powerful optimization capabilities of [llama.cpp](https://github.com/ggerganov/llama.cpp) into your ROS 2 projects by running [GGUF](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md)-based [LLMs](https://huggingface.co/models?sort=trending\u0026search=gguf+7b) and [VLMs](https://huggingface.co/models?sort=trending\u0026search=gguf+llava). You can also use features from llama.cpp such as [GBNF grammars](https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md) and modify LoRAs in real-time.\n\n\u003cdiv align=\"center\"\u003e\n\n[![License: MIT](https://img.shields.io/badge/GitHub-MIT-informational)](https://opensource.org/license/mit) [![GitHub release](https://img.shields.io/github/release/mgonzs13/llama_ros.svg)](https://github.com/mgonzs13/llama_ros/releases) [![Code Size](https://img.shields.io/github/languages/code-size/mgonzs13/llama_ros.svg?branch=main)](https://github.com/mgonzs13/llama_ros?branch=main) [![Last Commit](https://img.shields.io/github/last-commit/mgonzs13/llama_ros.svg)](https://github.com/mgonzs13/llama_ros/commits/main) [![GitHub issues](https://img.shields.io/github/issues/mgonzs13/llama_ros)](https://github.com/mgonzs13/llama_ros/issues) [![GitHub pull requests](https://img.shields.io/github/issues-pr/mgonzs13/llama_ros)](https://github.com/mgonzs13/llama_ros/pulls) [![Contributors](https://img.shields.io/github/contributors/mgonzs13/llama_ros.svg)](https://github.com/mgonzs13/llama_ros/graphs/contributors) [![Python Formatter Check](https://github.com/mgonzs13/llama_ros/actions/workflows/python-formatter.yml/badge.svg?branch=main)](https://github.com/mgonzs13/llama_ros/actions/workflows/python-formatter.yml?branch=main) [![C++ Formatter Check](https://github.com/mgonzs13/llama_ros/actions/workflows/cpp-formatter.yml/badge.svg?branch=main)](https://github.com/mgonzs13/llama_ros/actions/workflows/cpp-formatter.yml?branch=main)\n\n| ROS 2 Distro |                          Branch                           |                                                                                                     Build status                                                                                                      |                                                               Docker Image                                                               | Documentation                                                                                                                                            |\n| :----------: | :-------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------: | -------------------------------------------------------------------------------------------------------------------------------------------------------- |\n|  **Humble**  | [`main`](https://github.com/mgonzs13/llama_ros/tree/main) | [![Humble Build](https://github.com/mgonzs13/llama_ros/actions/workflows/humble-docker-build.yml/badge.svg?branch=main)](https://github.com/mgonzs13/llama_ros/actions/workflows/humble-docker-build.yml?branch=main) | [![Docker Image](https://img.shields.io/badge/Docker%20Image%20-humble-blue)](https://hub.docker.com/r/mgons/llama_ros/tags?name=humble) | [![Doxygen Deployment](https://github.com/mgonzs13/llama_ros/actions/workflows/doxygen-deployment.yml/badge.svg)](https://mgonzs13.github.io/llama_ros/) |\n|  **Jazzy**   | [`main`](https://github.com/mgonzs13/llama_ros/tree/main) |  [![Jazzy Build](https://github.com/mgonzs13/llama_ros/actions/workflows/jazzy-docker-build.yml/badge.svg?branch=main)](https://github.com/mgonzs13/llama_ros/actions/workflows/jazzy-docker-build.yml?branch=main)   |  [![Docker Image](https://img.shields.io/badge/Docker%20Image%20-jazzy-blue)](https://hub.docker.com/r/mgons/llama_ros/tags?name=jazzy)  | [![Doxygen Deployment](https://github.com/mgonzs13/llama_ros/actions/workflows/doxygen-deployment.yml/badge.svg)](https://mgonzs13.github.io/llama_ros/) |\n\n\u003c/div\u003e\n\n## Table of Contents\n\n1. [Related Projects](#related-projects)\n2. [Installation](#installation)\n3. [Docker](#docker)\n4. [Usage](#usage)\n   - [llama_cli](#llama_cli)\n   - [Launch Files](#launch-files)\n   - [LoRA Adapters](#lora-adapters)\n   - [ROS 2 Clients](#ros-2-clients)\n   - [LangChain](#langchain)\n5. [Demos](#demos)\n\n## Related Projects\n\n- [chatbot_ros](https://github.com/mgonzs13/chatbot_ros) \u0026rarr; This chatbot, integrated into ROS 2, uses [whisper_ros](https://github.com/mgonzs13/whisper_ros/tree/main), to listen to people speech; and llama_ros, to generate responses. The chatbot is controlled by a state machine created with [YASMIN](https://github.com/uleroboticsgroup/yasmin).\n- [explainable_ros](https://github.com/Dsobh/explainable_ROS) \u0026rarr; A ROS 2 tool to explain the behavior of a robot. Using the integration of LangChain, logs are stored in a vector database. Then, RAG is applied to retrieve relevant logs for user questions answered with llama_ros.\n\n## Installation\n\nTo run llama_ros with CUDA, first, you must install the [CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit). Then, you can compile llama_ros with `--cmake-args -DGGML_CUDA=ON` to enable CUDA support.\n\n```shell\ncd ~/ros2_ws/src\ngit clone https://github.com/mgonzs13/llama_ros.git\npip3 install -r llama_ros/requirements.txt\ncd ~/ros2_ws\nrosdep install --from-paths src --ignore-src -r -y\ncolcon build --cmake-args -DGGML_CUDA=ON # add this for CUDA\n```\n\n## Docker\n\nBuild the llama_ros docker or download an image from [DockerHub](https://hub.docker.com/repository/docker/mgons/llama_ros). You can choose to build llama_ros with CUDA (`USE_CUDA`) and choose the CUDA version (`CUDA_VERSION`). Remember that you have to use `DOCKER_BUILDKIT=0` to compile llama_ros with CUDA when building the image.\n\n\u003c!-- To build using CUDA you have to install the [NVIDIA Container Tollkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) and [configure the default runtime to NVIDIA](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/1.12.1/user-guide.html#daemon-configuration-file). --\u003e\n\n```shell\nDOCKER_BUILDKIT=0 docker build -t llama_ros --build-arg USE_CUDA=1 --build-arg CUDA_VERSION=12-6 .\n```\n\nRun the docker container. If you want to use CUDA, you have to install the [NVIDIA Container Tollkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) and add `--gpus all`.\n\n```shell\ndocker run -it --rm --gpus all llama_ros\n```\n\n## Usage\n\n### llama_cli\n\nCommands are included in llama_ros to speed up the test of GGUF-based LLMs within the ROS 2 ecosystem. This way, the following commands are integrating into the ROS 2 commands:\n\n#### launch\n\nUsing this command launch a LLM from a YAML file. The configuration of the YAML is used to launch the LLM in the same way as using a regular launch file. Here is an example of how to use it:\n\n```shell\nros2 llama launch ~/ros2_ws/src/llama_ros/llama_bringup/models/StableLM-Zephyr.yaml\n```\n\n#### prompt\n\nUsing this command send a prompt to a launched LLM. The command uses a string, which is the prompt and has the following arguments:\n\n- (`-r`, `--reset`): Whether to reset the LLM before prompting\n- (`-t`, `--temp`): The temperature value\n- (`--image-url`): Image url to sent to a VLM\n\nHere is an example of how to use it:\n\n```shell\nros2 llama prompt \"Do you know ROS 2?\" -t 0.0\n```\n\n### Launch Files\n\nFirst of all, you need to create a launch file to use llama_ros or llava_ros. This launch file will contain the main parameters to download the model from HuggingFace and configure it. Take a look at the following examples and the [predefined launch files](llama_bringup/launch).\n\n#### llama_ros (Python Launch)\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to expand\u003c/summary\u003e\n\n```python\nfrom launch import LaunchDescription\nfrom llama_bringup.utils import create_llama_launch\n\n\ndef generate_launch_description():\n\n    return LaunchDescription([\n        create_llama_launch(\n            n_ctx=2048, # context of the LLM in tokens\n            n_batch=8, # batch size in tokens\n            n_gpu_layers=0, # layers to load in GPU\n            n_threads=1, # threads\n            n_predict=2048, # max tokens, -1 == inf\n\n            model_repo=\"TheBloke/Marcoroni-7B-v3-GGUF\", # Hugging Face repo\n            model_filename=\"marcoroni-7b-v3.Q4_K_M.gguf\", # model file in repo\n\n            system_prompt_type=\"Alpaca\" # system prompt type\n        )\n    ])\n```\n\n```shell\nros2 launch llama_bringup marcoroni.launch.py\n```\n\n\u003c/details\u003e\n\n#### llama_ros (YAML Config)\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to expand\u003c/summary\u003e\n\n```yaml\nn_ctx: 2048 # context of the LLM in tokens\nn_batch: 8 # batch size in tokens\nn_gpu_layers: 0 # layers to load in GPU\nn_threads: 1 # threads\nn_predict: 2048 # max tokens, -1 == inf\n\nmodel_repo: \"cstr/Spaetzle-v60-7b-GGUF\" # Hugging Face repo\nmodel_filename: \"Spaetzle-v60-7b-q4-k-m.gguf\" # model file in repo\n\nsystem_prompt_type: \"Alpaca\" # system prompt type\n```\n\n```python\nimport os\nfrom launch import LaunchDescription\nfrom llama_bringup.utils import create_llama_launch_from_yaml\nfrom ament_index_python.packages import get_package_share_directory\n\n\ndef generate_launch_description():\n    return LaunchDescription([\n        create_llama_launch_from_yaml(os.path.join(\n            get_package_share_directory(\"llama_bringup\"), \"models\", \"Spaetzle.yaml\"))\n    ])\n```\n\n```shell\nros2 launch llama_bringup spaetzle.launch.py\n```\n\n\u003c/details\u003e\n\n#### llama_ros (YAML Config + model shards)\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to expand\u003c/summary\u003e\n\n```yaml\nn_ctx: 2048 # context of the LLM in tokens\nn_batch: 8 # batch size in tokens\nn_gpu_layers: 0 # layers to load in GPU\nn_threads: 1 # threads\nn_predict: 2048 # max tokens, -1 == inf\n\nmodel_repo: \"Qwen/Qwen2.5-Coder-7B-Instruct-GGUF\" # Hugging Face repo\nmodel_filename: \"qwen2.5-coder-7b-instruct-q4_k_m-00001-of-00002.gguf\" # model shard file in repo\n\nsystem_prompt_type: \"ChatML\" # system prompt type\n```\n\n```shell\nros2 llama launch Qwen2.yaml\n```\n\n\u003c/details\u003e\n\n#### llava_ros (Python Launch)\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to expand\u003c/summary\u003e\n\n```python\nfrom launch import LaunchDescription\nfrom llama_bringup.utils import create_llama_launch\n\ndef generate_launch_description():\n\n    return LaunchDescription([\n        create_llama_launch(\n            use_llava=True, # enable llava\n\n            n_ctx=8192, # context of the LLM in tokens, use a huge context size to load images\n            n_batch=512, # batch size in tokens\n            n_gpu_layers=33, # layers to load in GPU\n            n_threads=1, # threads\n            n_predict=8192, # max tokens, -1 == inf\n\n            model_repo=\"cjpais/llava-1.6-mistral-7b-gguf\", # Hugging Face repo\n            model_filename=\"llava-v1.6-mistral-7b.Q4_K_M.gguf\", # model file in repo\n\n            mmproj_repo=\"cjpais/llava-1.6-mistral-7b-gguf\", # Hugging Face repo\n            mmproj_filename=\"mmproj-model-f16.gguf\", # mmproj file in repo\n\n            system_prompt_type=\"Mistral\" # system prompt type\n        )\n    ])\n```\n\n```shell\nros2 launch llama_bringup llava.launch.py\n```\n\n\u003c/details\u003e\n\n#### llava_ros (YAML Config)\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to expand\u003c/summary\u003e\n\n```yaml\nuse_llava: True # enable llava\n\nn_ctx: 8192 # context of the LLM in tokens use a huge context size to load images\nn_batch: 512 # batch size in tokens\nn_gpu_layers: 33 # layers to load in GPU\nn_threads: 1 # threads\nn_predict: 8192 # max tokens -1 : :  inf\n\nmodel_repo: \"cjpais/llava-1.6-mistral-7b-gguf\" # Hugging Face repo\nmodel_filename: \"llava-v1.6-mistral-7b.Q4_K_M.gguf\" # model file in repo\n\nmmproj_repo: \"cjpais/llava-1.6-mistral-7b-gguf\" # Hugging Face repo\nmmproj_filename: \"mmproj-model-f16.gguf\" # mmproj file in repo\n\nsystem_prompt_type: \"mistral\" # system prompt type\n```\n\n```python\ndef generate_launch_description():\n    return LaunchDescription([\n        create_llama_launch_from_yaml(os.path.join(\n            get_package_share_directory(\"llama_bringup\"),\n            \"models\", \"llava-1.6-mistral-7b-gguf.yaml\"))\n    ])\n```\n\n```shell\nros2 launch llama_bringup llava.launch.py\n```\n\n\u003c/details\u003e\n\n### LoRA Adapters\n\nYou can use LoRA adapters when launching LLMs. Using llama.cpp features, you can load multiple adapters choosing the scale to apply for each adapter. Here you have an example of using LoRA adapters with Phi-3. You can lis the\nLoRAs using the `/llama/list_loras` service and modify their scales values by using the `/llama/update_loras` service. A scale value of 0.0 means not using that LoRA.\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to expand\u003c/summary\u003e\n\n```yaml\nn_ctx: 2048\nn_batch: 8\nn_gpu_layers: 0\nn_threads: 1\nn_predict: 2048\n\nmodel_repo: \"bartowski/Phi-3.5-mini-instruct-GGUF\"\nmodel_filename: \"Phi-3.5-mini-instruct-Q4_K_M.gguf\"\n\nlora_adapters:\n  - repo: \"zhhan/adapter-Phi-3-mini-4k-instruct_code_writing\"\n    filename: \"Phi-3-mini-4k-instruct-adaptor-f16-code_writer.gguf\"\n    scale: 0.5\n  - repo: \"zhhan/adapter-Phi-3-mini-4k-instruct_summarization\"\n    filename: \"Phi-3-mini-4k-instruct-adaptor-f16-summarization.gguf\"\n    scale: 0.5\n\nsystem_prompt_type: \"Phi-3\"\n```\n\n\u003c/details\u003e\n\n### ROS 2 Clients\n\nBoth llama_ros and llava_ros provide ROS 2 interfaces to access the main functionalities of the models. Here you have some examples of how to use them inside ROS 2 nodes. Moreover, take a look to the [llama_demo_node.py](llama_demos/llama_demos/llama_demo_node.py) and [llava_demo_node.py](llama_demos/llama_demos/llava_demo_node.py) demos.\n\n#### Tokenize\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to expand\u003c/summary\u003e\n\n```python\nfrom rclpy.node import Node\nfrom llama_msgs.srv import Tokenize\n\n\nclass ExampleNode(Node):\n    def __init__(self) -\u003e None:\n        super().__init__(\"example_node\")\n\n        # create the client\n        self.srv_client = self.create_client(Tokenize, \"/llama/tokenize\")\n\n        # create the request\n        req = Tokenize.Request()\n        req.text = \"Example text\"\n\n        # call the tokenize service\n        self.srv_client.wait_for_service()\n        tokens = self.srv_client.call(req).tokens\n```\n\n\u003c/details\u003e\n\n#### Detokenize\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to expand\u003c/summary\u003e\n\n```python\nfrom rclpy.node import Node\nfrom llama_msgs.srv import Detokenize\n\n\nclass ExampleNode(Node):\n    def __init__(self) -\u003e None:\n        super().__init__(\"example_node\")\n\n        # create the client\n        self.srv_client = self.create_client(Detokenize, \"/llama/detokenize\")\n\n        # create the request\n        req = Detokenize.Request()\n        req.tokens = [123, 123]\n\n        # call the tokenize service\n        self.srv_client.wait_for_service()\n        text = self.srv_client.call(req).text\n```\n\n\u003c/details\u003e\n\n#### Embeddings\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to expand\u003c/summary\u003e\n\n_Remember to launch llama_ros with embedding set to true to be able of generating embeddings with your LLM._\n\n```python\nfrom rclpy.node import Node\nfrom llama_msgs.srv import Embeddings\n\n\nclass ExampleNode(Node):\n    def __init__(self) -\u003e None:\n        super().__init__(\"example_node\")\n\n        # create the client\n        self.srv_client = self.create_client(Embeddings, \"/llama/generate_embeddings\")\n\n        # create the request\n        req = Embeddings.Request()\n        req.prompt = \"Example text\"\n        req.normalize = True\n\n        # call the embedding service\n        self.srv_client.wait_for_service()\n        embeddings = self.srv_client.call(req).embeddings\n```\n\n\u003c/details\u003e\n\n#### Generate Response\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to expand\u003c/summary\u003e\n\n```python\nimport rclpy\nfrom rclpy.node import Node\nfrom rclpy.action import ActionClient\nfrom llama_msgs.action import GenerateResponse\n\n\nclass ExampleNode(Node):\n    def __init__(self) -\u003e None:\n        super().__init__(\"example_node\")\n\n        # create the client\n        self.action_client = ActionClient(\n            self, GenerateResponse, \"/llama/generate_response\")\n\n        # create the goal and set the sampling config\n        goal = GenerateResponse.Goal()\n        goal.prompt = self.prompt\n        goal.sampling_config.temp = 0.2\n\n        # wait for the server and send the goal\n        self.action_client.wait_for_server()\n        send_goal_future = self.action_client.send_goal_async(\n            goal)\n\n        # wait for the server\n        rclpy.spin_until_future_complete(self, send_goal_future)\n        get_result_future = send_goal_future.result().get_result_async()\n\n        # wait again and take the result\n        rclpy.spin_until_future_complete(self, get_result_future)\n        result: GenerateResponse.Result = get_result_future.result().result\n```\n\n\u003c/details\u003e\n\n#### Generate Response (llava)\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to expand\u003c/summary\u003e\n\n```python\nimport cv2\nfrom cv_bridge import CvBridge\n\nimport rclpy\nfrom rclpy.node import Node\nfrom rclpy.action import ActionClient\nfrom llama_msgs.action import GenerateResponse\n\n\nclass ExampleNode(Node):\n    def __init__(self) -\u003e None:\n        super().__init__(\"example_node\")\n\n        # create a cv bridge for the image\n        self.cv_bridge = CvBridge()\n\n        # create the client\n        self.action_client = ActionClient(\n            self, GenerateResponse, \"/llama/generate_response\")\n\n        # create the goal and set the sampling config\n        goal = GenerateResponse.Goal()\n        goal.prompt = self.prompt\n        goal.sampling_config.temp = 0.2\n\n        # add your image to the goal\n        image = cv2.imread(\"/path/to/your/image\", cv2.IMREAD_COLOR)\n        goal.image = self.cv_bridge.cv2_to_imgmsg(image)\n\n        # wait for the server and send the goal\n        self.action_client.wait_for_server()\n        send_goal_future = self.action_client.send_goal_async(\n            goal)\n\n        # wait for the server\n        rclpy.spin_until_future_complete(self, send_goal_future)\n        get_result_future = send_goal_future.result().get_result_async()\n\n        # wait again and take the result\n        rclpy.spin_until_future_complete(self, get_result_future)\n        result: GenerateResponse.Result = get_result_future.result().result\n```\n\n\u003c/details\u003e\n\n### LangChain\n\nThere is a [llama_ros integration for LangChain](llama_ros/llama_ros/langchain/). Thus, prompt engineering techniques could be applied. Here you have an example to use it.\n\n#### llama_ros (Chain)\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to expand\u003c/summary\u003e\n\n```python\nimport rclpy\nfrom llama_ros.langchain import LlamaROS\nfrom langchain.prompts import PromptTemplate\nfrom langchain_core.output_parsers import StrOutputParser\n\n\nrclpy.init()\n\n# create the llama_ros llm for langchain\nllm = LlamaROS()\n\n# create a prompt template\nprompt_template = \"tell me a joke about {topic}\"\nprompt = PromptTemplate(\n    input_variables=[\"topic\"],\n    template=prompt_template\n)\n\n# create a chain with the llm and the prompt template\nchain = prompt | llm | StrOutputParser()\n\n# run the chain\ntext = chain.invoke({\"topic\": \"bears\"})\nprint(text)\n\nrclpy.shutdown()\n```\n\n\u003c/details\u003e\n\n#### llama_ros (Stream)\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to expand\u003c/summary\u003e\n\n```python\nimport rclpy\nfrom llama_ros.langchain import LlamaROS\nfrom langchain.prompts import PromptTemplate\nfrom langchain_core.output_parsers import StrOutputParser\n\n\nrclpy.init()\n\n# create the llama_ros llm for langchain\nllm = LlamaROS()\n\n# create a prompt template\nprompt_template = \"tell me a joke about {topic}\"\nprompt = PromptTemplate(\n    input_variables=[\"topic\"],\n    template=prompt_template\n)\n\n# create a chain with the llm and the prompt template\nchain = prompt | llm | StrOutputParser()\n\n# run the chain\nfor c in chain.stream({\"topic\": \"bears\"}):\n    print(c, flush=True, end=\"\")\n\nrclpy.shutdown()\n```\n\n\u003c/details\u003e\n\n#### llava_ros\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to expand\u003c/summary\u003e\n\n```python\nimport rclpy\nfrom llama_ros.langchain import LlamaROS\n\nrclpy.init()\n\n# create the llama_ros llm for langchain\nllm = LlamaROS()\n\n# bind the url_image\nimage_url = \"https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg\"\nllm = llm.bind(image_url=image_url).stream(\"Describe the image\")\n\n# run the llm\nfor c in llm:\n    print(c, flush=True, end=\"\")\n\nrclpy.shutdown()\n```\n\n\u003c/details\u003e\n\n#### llama_ros_embeddings (RAG)\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to expand\u003c/summary\u003e\n\n```python\nimport rclpy\nfrom langchain_chroma import Chroma\nfrom llama_ros.langchain import LlamaROSEmbeddings\n\n\nrclpy.init()\n\n# create the llama_ros embeddings for langchain\nembeddings = LlamaROSEmbeddings()\n\n# create a vector database and assign it\ndb = Chroma(embedding_function=embeddings)\n\n# create the retriever\nretriever = db.as_retriever(search_kwargs={\"k\": 5})\n\n# add your texts\ndb.add_texts(texts=[\"your_texts\"])\n\n# retrieve documents\ndocuments = retriever.invoke(\"your_query\")\nprint(documents)\n\nrclpy.shutdown()\n```\n\n\u003c/details\u003e\n\n#### llama_ros (Renranker)\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to expand\u003c/summary\u003e\n\n```python\nimport rclpy\nfrom llama_ros.langchain import LlamaROSReranker\nfrom llama_ros.langchain import LlamaROSEmbeddings\n\nfrom langchain_community.vectorstores import FAISS\nfrom langchain_community.document_loaders import TextLoader\nfrom langchain_text_splitters import RecursiveCharacterTextSplitter\nfrom langchain.retrievers import ContextualCompressionRetriever\n\n\nrclpy.init()\n\n# load the documents\ndocuments = TextLoader(\"../state_of_the_union.txt\",).load()\ntext_splitter = RecursiveCharacterTextSplitter(\n    chunk_size=500, chunk_overlap=100)\ntexts = text_splitter.split_documents(documents)\n\n# create the llama_ros embeddings\nembeddings = LlamaROSEmbeddings()\n\n# create the VD and the retriever\nretriever = FAISS.from_documents(\n    texts, embeddings).as_retriever(search_kwargs={\"k\": 20})\n\n# create the compressor using the llama_ros reranker\ncompressor = LlamaROSReranker()\ncompression_retriever = ContextualCompressionRetriever(\n    base_compressor=compressor, base_retriever=retriever\n)\n\n# retrieve the documents\ncompressed_docs = compression_retriever.invoke(\n    \"What did the president say about Ketanji Jackson Brown\"\n)\n\nfor doc in compressed_docs:\n    print(\"-\" * 50)\n    print(doc.page_content)\n    print(\"\\n\")\n\nrclpy.shutdown()\n```\n\n\u003c/details\u003e\n\n#### llama_ros (LLM + RAG + Reranker)\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to expand\u003c/summary\u003e\n\n```python\nimport bs4\nimport rclpy\n\nfrom langchain_chroma import Chroma\nfrom langchain_community.document_loaders import WebBaseLoader\nfrom langchain_core.output_parsers import StrOutputParser\nfrom langchain_core.runnables import RunnablePassthrough\nfrom langchain_core.messages import SystemMessage\nfrom langchain_core.prompts import ChatPromptTemplate, HumanMessagePromptTemplate\nfrom langchain_text_splitters import RecursiveCharacterTextSplitter\nfrom langchain.retrievers import ContextualCompressionRetriever\n\nfrom llama_ros.langchain import ChatLlamaROS, LlamaROSEmbeddings, LlamaROSReranker\n\n\nrclpy.init()\n\n# load, chunk and index the contents of the blog\nloader = WebBaseLoader(\n    web_paths=(\"https://lilianweng.github.io/posts/2023-06-23-agent/\",),\n    bs_kwargs=dict(\n        parse_only=bs4.SoupStrainer(class_=(\"post-content\", \"post-title\", \"post-header\"))\n    ),\n)\ndocs = loader.load()\n\ntext_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)\nsplits = text_splitter.split_documents(docs)\nvectorstore = Chroma.from_documents(documents=splits, embedding=LlamaROSEmbeddings())\n\n# retrieve and generate using the relevant snippets of the blog\nretriever = vectorstore.as_retriever(search_kwargs={\"k\": 20})\n\n# create prompt\nprompt = ChatPromptTemplate.from_messages(\n    [\n        SystemMessage(\"You are an AI assistant that answer questions briefly.\"),\n        HumanMessagePromptTemplate.from_template(\n            \"Taking into account the followin information:{context}\\n\\n{question}\"\n        ),\n    ]\n)\n\n# create rerank compression retriever\ncompressor = LlamaROSReranker(top_n=3)\ncompression_retriever = ContextualCompressionRetriever(\n    base_compressor=compressor, base_retriever=retriever\n)\n\n\ndef format_docs(docs):\n    formated_docs = \"\"\n\n    for d in docs:\n        formated_docs += f\"\\n\\n\\t- {d.page_content}\"\n\n    return formated_docs\n\n\n# create and use the chain\nrag_chain = (\n    {\"context\": compression_retriever | format_docs, \"question\": RunnablePassthrough()}\n    | prompt\n    | ChatLlamaROS(temp=0.0)\n    | StrOutputParser()\n)\n\nfor c in rag_chain.stream(\"What is Task Decomposition?\"):\n    print(c, flush=True, end=\"\")\n\nrclpy.shutdown()\n```\n\n\u003c/details\u003e\n\n#### chat_llama_ros (Chat + VLM)\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to expand\u003c/summary\u003e\n\n```python\nimport rclpy\nfrom llama_ros.langchain import ChatLlamaROS\nfrom langchain_core.messages import SystemMessage\nfrom langchain_core.prompts import ChatPromptTemplate, HumanMessagePromptTemplate\nfrom langchain_core.output_parsers import StrOutputParser\n\n\nrclpy.init()\n\n# create chat\nchat = ChatLlamaROS(\n    temp=0.2,\n    penalty_last_n=8\n)\n\n# create prompt template with messages\nprompt = ChatPromptTemplate.from_messages([\n    SystemMessage(\"You are a IA that just answer with a single word.\"),\n    HumanMessagePromptTemplate.from_template(template=[\n        {\"type\": \"text\", \"text\": \"\u003cimage\u003eWho is the character in the middle of the image?\"},\n        {\"type\": \"image_url\", \"image_url\": \"{image_url}\"}\n    ])\n])\n\n# create the chain\nchain = prompt | chat | StrOutputParser()\n\n# stream and print the LLM output\nfor text in chain.stream({\"image_url\": \"https://pics.filmaffinity.com/Dragon_Ball_Bola_de_Dragaon_Serie_de_TV-973171538-large.jpg\"}):\n    print(text, end=\"\", flush=True)\n\nprint(\"\", end=\"\\n\", flush=True)\n\nrclpy.shutdown()\n```\n\n\u003c/details\u003e\n\n\u003c/details\u003e\n\n#### chat_llama_ros (Structured output)\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to expand\u003c/summary\u003e\n\n```python\nimport rclpy\n\nfrom langchain_core.messages import HumanMessage\nfrom llama_ros.langchain import ChatLlamaROS\nfrom pydantic import BaseModel, Field\n\nrclpy.init()\n\nclass Joke(BaseModel):\n    \"\"\"Joke to tell user.\"\"\"\n\n    setup: str = Field(description=\"The setup of the joke\")\n    punchline: str = Field(description=\"The punchline to the joke\")\n    rating: Optional[int] = Field(\n        default=None, description=\"How funny the joke is, from 1 to 10\"\n    )\n\nchat = ChatLlamaROS(temp=0.6, penalty_last_n=8)\n\nstructured_chat = chat.with_structured_output(\n    Joke, method=\"function_calling\"\n)\n\nprompt = ChatPromptTemplate.from_messages(\n    [\n        HumanMessagePromptTemplate.from_template(\n            template=[\n                {\"type\": \"text\", \"text\": \"{prompt}\"},\n            ]\n        ),\n    ]\n)\n\nchain = prompt | structured_chat\n\nres = chain.invoke({\"prompt\": \"Tell me a joke about cats\"})\n\nprint(f\"Response: {response.content.strip()}\")\n\nrclpy.shutdown()\n```\n\n\u003c/details\u003e\n\n#### chat_llama_ros (Tools)\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to expand\u003c/summary\u003e\n\nThe current implementation of Tools allows executing tools without requiring a model trained for that task.\n\n```python\nfrom random import randint\n\nimport rclpy\n\nfrom langchain.tools import tool\nfrom langchain_core.messages import HumanMessage\nfrom llama_ros.langchain import ChatLlamaROS\n\nrclpy.init()\n\n@tool\ndef get_inhabitants(city: str) -\u003e int:\n    \"\"\"Get the current temperature of a city\"\"\"\n    return randint(4_000_000, 8_000_000)\n\n\n@tool\ndef get_curr_temperature(city: str) -\u003e int:\n    \"\"\"Get the current temperature of a city\"\"\"\n    return randint(20, 30)\n\nchat = ChatLlamaROS(temp=0.6, penalty_last_n=8)\n\nmessages = [\n    HumanMessage(\n        \"What is the current temperature in Madrid? And its inhabitants?\"\n    )\n]\n\nllm_tools = chat.bind_tools(\n    [get_inhabitants, get_curr_temperature], tool_choice='any'\n)\n\nall_tools_res = llm_tools.invoke(messages)\nmessages.append(all_tools_res)\n\nfor tool in all_tools_res.tool_calls:\n    selected_tool = {\n        \"get_inhabitants\": get_inhabitants, \"get_curr_temperature\": get_curr_temperature\n    }[tool['name']]\n\n    tool_msg = selected_tool.invoke(tool)\n\n    formatted_output = f\"{tool['name']}({''.join(tool['args'].values())}) = {tool_msg.content}\"\n\n    tool_msg.additional_kwargs = {'args': tool['args']}\n    messages.append(tool_msg)\n\nres = llm_tools.invoke(messages)\n\nprint(f\"Response: {res.content}\")\n\nrclpy.shutdown()\n```\n\n\u003c/details\u003e\n\n#### chat_llama_ros (Reasoning)\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to expand\u003c/summary\u003e\n\nA reasoning model is required, such as Deepseek R1\n\n```python\nimport time\nfrom random import randint\n\nimport rclpy\n\nfrom langchain_core.messages import HumanMessage\nfrom llama_ros.langchain import ChatLlamaROS\n\nrclpy.init()\n\nchat = ChatLlamaROS(temp=0.6, penalty_last_n=8)\n\nmessages = [\n    HumanMessage(\n        \"Here we have a book, a laptop, 9 eggs and a nail. Please tell me how to stack them onto each other in a stable manner.\"\n    )\n]\n\nres = chat.invoke(messages)\n\nprint(f\"Response: {res.content.strip()}\")\nprint(f\"Reasoning: {res.additional_kwargs[\"reasoning_content\"]}\")\n\nrclpy.shutdown()\n```\n\n\u003c/details\u003e\n\n#### chat_llama_ros (langgraph)\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to expand\u003c/summary\u003e\n\n```python\nimport time\nfrom random import randint\n\nimport rclpy\n\nfrom langchain.tools import tool\nfrom langchain_core.messages import HumanMessage\nfrom langgraph.prebuilt import create_react_agent\nfrom llama_ros.langchain import ChatLlamaROS\n\nrclpy.init()\n\n@tool\ndef get_inhabitants(city: str) -\u003e int:\n    \"\"\"Get the current temperature of a city\"\"\"\n    return randint(4_000_000, 8_000_000)\n\n\n@tool\ndef get_curr_temperature(city: str) -\u003e int:\n    \"\"\"Get the current temperature of a city\"\"\"\n    return randint(20, 30)\n\nchat = ChatLlamaROS(temp=0.0)\n\nagent_executor = create_react_agent(\n    self.chat, [get_inhabitants, get_curr_temperature]\n)\n\nresponse = self.agent_executor.invoke(\n    {\n        \"messages\": [\n            HumanMessage(\n                content=\"What is the current temperature in Madrid? And its inhabitants?\"\n            )\n        ]\n    }\n)\n\nprint(f\"Response: {response['messages'][-1].content}\")\n\nrclpy.shutdown()\n```\n\n\u003c/details\u003e\n\n## Demos\n\n### LLM Demo\n\n```shell\nros2 launch llama_bringup spaetzle.launch.py\n```\n\n```shell\nros2 run llama_demos llama_demo_node\n```\n\n\u003c!-- https://user-images.githubusercontent.com/25979134/229344687-9dda3446-9f1f-40ab-9723-9929597a042c.mp4 --\u003e\n\nhttps://github.com/mgonzs13/llama_ros/assets/25979134/9311761b-d900-4e58-b9f8-11c8efefdac4\n\n### Embeddings Generation Demo\n\n```shell\nros2 llama launch ~/ros2_ws/src/llama_ros/llama_bringup/models/bge-base-en-v1.5.yaml\n```\n\n```shell\nros2 run llama_demos llama_embeddings_demo_node\n```\n\nhttps://github.com/user-attachments/assets/7d722017-27dc-417c-ace7-bf6b747e4ced\n\n### Reranking Demo\n\n```shell\nros2 llama launch ~/ros2_ws/src/llama_ros/llama_bringup/models/jina-reranker.yaml\n```\n\n```shell\nros2 run llama_demos llama_rerank_demo_node\n```\n\nhttps://github.com/user-attachments/assets/4b4adb4d-7c70-43ea-a2c1-9be57d211484\n\n### VLM Demo\n\n```shell\nros2 launch llama_bringup minicpm-2.6.launch.py\n```\n\n```shell\nros2 run llama_demos llava_demo_node --ros-args -p prompt:=\"your prompt\" -p image_url:=\"url of the image\" -p use_image:=\"whether to send the image\"\n```\n\nhttps://github.com/mgonzs13/llama_ros/assets/25979134/4a9ef92f-9099-41b4-8350-765336e3503c\n\n### Chat Template Demo\n\n```shell\nros2 llama launch MiniCPM-2.6.yaml\n```\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to expand MiniCPM-2.6.yaml\u003c/summary\u003e\n\n```yaml\nuse_llava: True\n\nn_ctx: 8192\nn_batch: 512\nn_gpu_layers: 20\nn_threads: -1\nn_predict: 8192\n\nimage_prefix: \"\u003cimage\u003e\"\nimage_suffix: \"\u003c/image\u003e\"\n\nmodel_repo: \"openbmb/MiniCPM-V-2_6-gguf\"\nmodel_filename: \"ggml-model-Q4_K_M.gguf\"\n\nmmproj_repo: \"openbmb/MiniCPM-V-2_6-gguf\"\nmmproj_filename: \"mmproj-model-f16.gguf\"\n```\n\n\u003c/details\u003e\n\n```shell\nros2 run llama_demos chatllama_demo_node\n```\n\n[ChatLlamaROS demo](https://github-production-user-asset-6210df.s3.amazonaws.com/55236157/363094669-c6de124a-4e91-4479-99b6-685fecb0ac20.webm?X-Amz-Algorithm=AWS4-HMAC-SHA256\u0026X-Amz-Credential=AKIAVCODYLSA53PQK4ZA%2F20240830%2Fus-east-1%2Fs3%2Faws4_request\u0026X-Amz-Date=20240830T081232Z\u0026X-Amz-Expires=300\u0026X-Amz-Signature=f937758f4bcbaec7683e46ddb057fb642dc86a33cc8c736fca3b5ce2bf06ddac\u0026X-Amz-SignedHeaders=host\u0026actor_id=55236157\u0026key_id=0\u0026repo_id=622137360)\n\n### Chat Structed Output Demo\n\n```shell\nros2 llama launch Qwen2.yaml\n```\n\n```shell\nros2 run llama_demos chatllama_structured_demo_node\n```\n\n[Structured Output ChatLlama](https://github.com/user-attachments/assets/e0bf4031-50c0-4790-94a0-1f6aed5734ec)\n\n### Chat Tools Demo\n\n```shell\nros2 llama launch Qwen2.yaml\n```\n\n```shell\nros2 run llama_demos chatllama_tools_demo_node\n```\n\n[Tools ChatLlama](https://github.com/user-attachments/assets/b912ee29-1466-4d6a-888b-9a2d9c16ae1d)\n\n### Chat Reasoning Demo (DeepSeek-R1)\n\n```shell\nros2 llama launch DeepSeek-R1.yaml\n```\n\n```shell\nros2 run llama_demos chatllama_reasoning_demo_node\n```\n\n[DeepSeekR1 ChatLlama](https://github.com/user-attachments/assets/3f268614-eabc-4499-b50f-a76d76908d9d)\n\n### Langgraph Demo\n\n```shell\nros2 llama launch Qwen2.yaml\n```\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to expand Qwen2.yaml\u003c/summary\u003e\n\n```yaml\n_ctx: 4096\nn_batch: 256\nn_gpu_layers: 29\nn_threads: -1\nn_predict: -1\n\nmodel_repo: \"Qwen/Qwen2.5-Coder-7B-Instruct-GGUF\"\nmodel_filename: \"qwen2.5-coder-7b-instruct-q4_k_m-00001-of-00002.gguf\"\n```\n\n\u003c/details\u003e\n\n```shell\nros2 run llama_demos chatllama_langgraph_demo_node\n```\n\n[Langgraph ChatLlama](https://github.com/user-attachments/assets/a0991cb4-f7f4-43d5-b629-3b1819aead0d)\n\n### RAG Demo (LLM + chat template + RAG + Reranking + Stream)\n\n```shell\nros2 llama launch ~/ros2_ws/src/llama_ros/llama_bringup/models/bge-base-en-v1.5.yaml\n```\n\n```shell\nros2 llama launch ~/ros2_ws/src/llama_ros/llama_bringup/models/jina-reranker.yaml\n```\n\n```shell\nros2 llama launch Qwen2.yaml\n```\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to expand Qwen2.yaml\u003c/summary\u003e\n\n```yaml\n_ctx: 4096\nn_batch: 256\nn_gpu_layers: 29\nn_threads: -1\nn_predict: -1\n\nmodel_repo: \"Qwen/Qwen2.5-Coder-3B-Instruct-GGUF\"\nmodel_filename: \"qwen2.5-coder-3b-instruct-q4_k_m.gguf\"\n```\n\n\u003c/details\u003e\n\n```shell\nros2 run llama_demos llama_rag_demo_node\n```\n\nhttps://github.com/user-attachments/assets/b4e3957d-1f92-427b-a1a8-cfc76737c0d6\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmgonzs13%2Fllama_ros","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmgonzs13%2Fllama_ros","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmgonzs13%2Fllama_ros/lists"}