{"id":21166877,"url":"https://github.com/agora-lab-ai/srt","last_synced_at":"2025-04-13T14:24:40.699Z","repository":{"id":263332229,"uuid":"890056287","full_name":"Agora-Lab-AI/SRT","owner":"Agora-Lab-AI","description":"An open-source non-official community implementation of the model from the paper: Surgical Robot Transformer (SRT): Imitation Learning for Surgical Tasks: https://surgical-robot-transformer.github.io/","archived":false,"fork":false,"pushed_at":"2025-03-24T00:17:10.000Z","size":37,"stargazers_count":8,"open_issues_count":4,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-27T05:25:13.531Z","etag":null,"topics":["ai","health-ai","health-robot-ai","healthai","medical-ai","ml","srt","surgical-robots","surgical-robots-tools","surgical-tools"],"latest_commit_sha":null,"homepage":"https://agoralab.xyz","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Agora-Lab-AI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["kyegomez"],"patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"lfx_crowdfunding":null,"custom":null}},"created_at":"2024-11-17T22:24:31.000Z","updated_at":"2025-02-28T06:26:11.000Z","dependencies_parsed_at":"2024-11-17T22:47:56.920Z","dependency_job_id":"224f724d-334b-46f6-b2c3-1a743a361671","html_url":"https://github.com/Agora-Lab-AI/SRT","commit_stats":null,"previous_names":["agora-lab-ai/srt"],"tags_count":0,"template":false,"template_full_name":"kyegomez/Python-Package-Template","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Agora-Lab-AI%2FSRT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Agora-Lab-AI%2FSRT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Agora-Lab-AI%2FSRT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Agora-Lab-AI%2FSRT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Agora-Lab-AI","download_url":"https://codeload.github.com/Agora-Lab-AI/SRT/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248726092,"owners_count":21151848,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","health-ai","health-robot-ai","healthai","medical-ai","ml","srt","surgical-robots","surgical-robots-tools","surgical-tools"],"created_at":"2024-11-20T14:53:30.412Z","updated_at":"2025-04-13T14:24:40.652Z","avatar_url":"https://github.com/Agora-Lab-AI.png","language":"Python","funding_links":["https://github.com/sponsors/kyegomez"],"categories":[],"sub_categories":[],"readme":"# Surgical Robot Transformer (SRT)\n\n[![Join our Discord](https://img.shields.io/badge/Discord-Join%20our%20server-5865F2?style=for-the-badge\u0026logo=discord\u0026logoColor=white)](https://discord.gg/agora-999382051935506503) [![Subscribe on YouTube](https://img.shields.io/badge/YouTube-Subscribe-red?style=for-the-badge\u0026logo=youtube\u0026logoColor=white)](https://www.youtube.com/@kyegomez3242) [![Connect on LinkedIn](https://img.shields.io/badge/LinkedIn-Connect-blue?style=for-the-badge\u0026logo=linkedin\u0026logoColor=white)](https://www.linkedin.com/in/kye-g-38759a207/) [![Follow on X.com](https://img.shields.io/badge/X.com-Follow-1DA1F2?style=for-the-badge\u0026logo=x\u0026logoColor=white)](https://x.com/kyegomezb)\n\n\nAn open-source non-official community implementation of the model from the paper: Surgical Robot Transformer (SRT): Imitation Learning for Surgical Tasks: https://surgical-robot-transformer.github.io/\n\n\n## Installation\n\n```bash\npip3 install srt-torch\n```\n\n\n## Usage\n\n```python\n\nimport torch\nfrom loguru import logger\nfrom srt_torch.main import (\n    SurgicalRobotTransformer,\n    ModelConfig,\n    RobotObservation,\n)\n\n\ndef run_forward_pass():\n    # Initialize model and config\n    config = ModelConfig()\n    model = SurgicalRobotTransformer(config)\n    model.eval()  # Set to evaluation mode\n\n    # Create sample camera images (simulating robot observations)\n    # Normally these would come from your robot's cameras\n    sample_image = torch.zeros((3, 224, 224))  # [C, H, W] format\n\n    # Create observation object containing all camera views\n    observation = RobotObservation(\n        stereo_left=sample_image,\n        stereo_right=sample_image,\n        wrist_left=sample_image,\n        wrist_right=sample_image,\n    )\n\n    # Perform forward pass\n    with torch.no_grad():\n        try:\n            action = model(observation)\n\n            # Extract predicted actions\n            left_pos = action.left_pos.numpy()  # [3] - xyz position\n            left_rot = action.left_rot.numpy()  # [6] - 6D rotation\n            left_grip = (\n                action.left_gripper.numpy()\n            )  # [1] - gripper angle\n\n            right_pos = action.right_pos.numpy()  # [3]\n            right_rot = action.right_rot.numpy()  # [6]\n            right_grip = action.right_gripper.numpy()  # [1]\n\n            logger.info(f\"Left arm position: {left_pos}\")\n            logger.info(f\"Left arm rotation: {left_rot}\")\n            logger.info(f\"Left gripper angle: {left_grip}\")\n\n            logger.info(f\"Right arm position: {right_pos}\")\n            logger.info(f\"Right arm rotation: {right_rot}\")\n            logger.info(f\"Right gripper angle: {right_grip}\")\n\n            return action\n\n        except Exception as e:\n            logger.error(f\"Error during forward pass: {str(e)}\")\n            raise\n\n\nif __name__ == \"__main__\":\n    # Set up logging\n    logger.add(\"srt_inference.log\")\n    logger.info(\"Starting SRT forward pass example\")\n\n    action = run_forward_pass()\n\n    logger.info(\"Forward pass completed successfully\")\n\n\n```\n\n## Model Architecture\n```mermaid\nflowchart TB\n    subgraph Inputs[\"Input Observations\"]\n        SL[Stereo Left Image]\n        SR[Stereo Right Image]\n        WL[Wrist Left Image]\n        WR[Wrist Right Image]\n    end\n\n    subgraph ImageEncoder[\"Image Encoder\"]\n        direction TB\n        CNN[\"CNN Backbone\n        Conv2d layers\n        ReLU + MaxPool\"]\n        Proj[\"Projection Layer\n        Linear(256, hidden_dim)\"]\n        CNN --\u003e Proj\n    end\n\n    subgraph TransformerEncoder[\"Transformer Encoder (x4 layers)\"]\n        direction TB\n        SA[\"Self Attention\"]\n        FF[\"Feed Forward\"]\n        N1[\"LayerNorm\"]\n        N2[\"LayerNorm\"]\n        SA --\u003e N1\n        N1 --\u003e FF\n        FF --\u003e N2\n    end\n\n    subgraph TransformerDecoder[\"Transformer Decoder (x7 layers)\"]\n        direction TB\n        CA[\"Cross Attention\"]\n        FFD[\"Feed Forward\"]\n        N3[\"LayerNorm\"]\n        N4[\"LayerNorm\"]\n        CA --\u003e N3\n        N3 --\u003e FFD\n        FFD --\u003e N4\n    end\n\n    subgraph ActionPredictor[\"Action Predictor\"]\n        direction TB\n        MLP[\"MLP Layers\"]\n        Out[\"Output Layer \n        20-dim vector\"]\n        MLP --\u003e Out\n    end\n\n    subgraph Outputs[\"Action Outputs\"]\n        LP[\"Left Position (3)\"]\n        LR[\"Left Rotation (6)\"]\n        LG[\"Left Gripper (1)\"]\n        RP[\"Right Position (3)\"]\n        RR[\"Right Rotation (6)\"]\n        RG[\"Right Gripper (1)\"]\n    end\n\n    SL \u0026 SR \u0026 WL \u0026 WR --\u003e ImageEncoder\n    ImageEncoder --\u003e |\"[B, 4, D]\"| TransformerEncoder\n    TransformerEncoder --\u003e |\"Memory\"| TransformerDecoder\n    TransformerDecoder --\u003e |\"[B, D]\"| ActionPredictor\n    ActionPredictor --\u003e LP \u0026 LR \u0026 LG \u0026 RP \u0026 RR \u0026 RG\n\n```\n\n\n## Training Example\n** on progress **\n\n## Datasets\n\nFrom Section 5 (Experiment Setup), here are the datasets they collected:\n\n1. Tissue Lift Dataset:\n- 224 trials\n- Single user\n- Collected across multiple days\n- Task: Grabbing corner of rubber pad and lifting upwards\n- Training constraint: Corner kept within marked red box area\n\n2. Needle Pickup and Handover Dataset:\n- 250 trials\n- Single user\n- Collected across multiple days\n- Task: Picking up needle and transferring between arms\n- Training constraint: Needle placed randomly inside red box area\n- Test setup: Center hump of needle placed at nine predefined locations\n\n3. Knot Tying Dataset:\n- 500 trials\n- Single user\n- Collected across multiple days\n- Task: Creating loop with left string, grabbing terminal end through loop, pulling grippers apart\n- Training constraint: String origins randomly placed inside red box\n- Test setup: Strings centered in red box\n\nAdditional Test Datasets (Generalization):\n1. Pork Tissue Background\n- Used for needle pickup and handover task evaluation\n- Success rate: 9/9 on pickup, 9/9 on handover\n\n2. Chicken Tissue Background\n- Used for qualitative evaluation\n- No specific trial numbers mentioned\n\n3. 3D Suture Pad\n- Used for qualitative evaluation\n- No specific trial numbers mentioned\n\nImportant Dataset Collection Details:\n- All data collected on da Vinci Research Kit (dVRK)\n- Used stereo endoscope and wrist cameras\n- Collected in reference configuration shown in Fig. 5\n- Used simulated abdomen dome for tool placement\n- Approximate placement through larger holes than tool shaft size\n- Manual placement using setup joints\n\nThe key point about their dataset is from Section 1:\n\u003e \"...as of 2021, over 10 million surgeries have been performed using 6,500 da Vinci systems in 67 countries, with 55,000 surgeons trained on the system [2]. Often, the video and kinematics data are recorded for post-operative analysis, resulting in a large repository of demonstration data.\"\n\nHowever, they did not use this larger dataset, instead collecting their own controlled dataset for the study.\n\nThe paper does not mention if they plan to release their datasets publicly.\n\n\n## Implementation Details from the paper:\n\n```txt\n4 Implementation Details\nTo train our policies, we use action chunking with transformers (ACT) [23] and diffusion policy\n[64]. The policies were trained using the endoscope and wrist cameras images as input, which are all\ndownsized to image size of 224 × 224 × 3. The original input size of the surgical endoscope images\nwere 1024 × 1280 × 3 and the wrist images were 480 × 640 × 3. Kinematics data is not provided as\ninput as commonly done in other imitation learning approaches because it is generally inconsistent\ndue to the design limitations of the dVRK. The policy outputs include the end-effector (delta) position,\n(delta) orientation, and jaw angle for both arms. We leave further specific implementation details in\nAppendix A.\n```\n\n### Appendix A\n\n```txt\nmain modifications include changing the input layers to accept four images, which include left/right surgical endoscope views and left/right wrist camera views. The output dimensions are also\nrevised to generate end-effector poses, which amounts to a 10-dim vector for each arm (position [3]+ orientation [6] + jaw angle [1] = 10), thus amounting to a 20-dim vector total for both arms. The\norientation was modeled using a 6D rotation representation following [21], where the 6 elements corrrespond to the first two columns of the rotation matrix. Since the network predictions may not\ngenerate orthonormal vectors, Gram-Schmidt process is performed to convert them to orthonormal vectors, and a cross product of the two vectors are performed to generate the remaining third column\nof the rotation matrix. For diffusion policy, similar modifications are made such as changing the input and the output dimensions of the network appropriately. The specific hyperparameters for training\nare shown in Table 3 and 4.\n```\n\n# Todo\n\n- [ ] Add training logic (in progress)\n- [ ] Start testsing\n- [ ] Make a list of the datasets used in the paper\n\n\n## Citation\n\n\n```bibtex\n@misc{kim2024surgicalrobottransformersrt,\n    title={Surgical Robot Transformer (SRT): Imitation Learning for Surgical Tasks}, \n    author={Ji Woong Kim and Tony Z. Zhao and Samuel Schmidgall and Anton Deguet and Marin Kobilarov and Chelsea Finn and Axel Krieger},\n    year={2024},\n    eprint={2407.12998},\n    archivePrefix={arXiv},\n    primaryClass={cs.RO},\n    url={https://arxiv.org/abs/2407.12998}, \n}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fagora-lab-ai%2Fsrt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fagora-lab-ai%2Fsrt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fagora-lab-ai%2Fsrt/lists"}