{"id":25005452,"url":"https://github.com/romaingrx/second-order-jailbreak","last_synced_at":"2025-04-12T14:33:42.969Z","repository":{"id":197700284,"uuid":"698591509","full_name":"romaingrx/Second-Order-Jailbreak","owner":"romaingrx","description":"NeurIPS workshop : We examine the risk of powerful malignant intelligent actors spreading their influence over networks of agents with varying intelligence and motivations.","archived":false,"fork":false,"pushed_at":"2023-12-11T05:55:11.000Z","size":14267,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-26T09:11:09.472Z","etag":null,"topics":["ai-safety","multi-agents"],"latest_commit_sha":null,"homepage":"https://second-order-jailbreak.romaingrx.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/romaingrx.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-09-30T11:23:59.000Z","updated_at":"2024-01-05T08:20:57.000Z","dependencies_parsed_at":"2023-12-11T07:58:17.419Z","dependency_job_id":null,"html_url":"https://github.com/romaingrx/Second-Order-Jailbreak","commit_stats":null,"previous_names":["romaingrx/second-order-jailbreak"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/romaingrx%2FSecond-Order-Jailbreak","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/romaingrx%2FSecond-Order-Jailbreak/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/romaingrx%2FSecond-Order-Jailbreak/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/romaingrx%2FSecond-Order-Jailbreak/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/romaingrx","download_url":"https://codeload.github.com/romaingrx/Second-Order-Jailbreak/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248581365,"owners_count":21128155,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-safety","multi-agents"],"created_at":"2025-02-05T00:39:57.795Z","updated_at":"2025-04-12T14:33:42.946Z","avatar_url":"https://github.com/romaingrx.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Multi-Agent-Failures\n\nRunning the code\n```bash\nOPENAI_API_KEY=[your key] python3 run_twoplayer.py envs/basic-2-agents.0.2.json --num_steps 10\n```\n\nRunning multiple conversation\n```bash\nbash run_multiple.sh envs/2-agents/hard/2-agents-mistralvsmistral.json 20 10\n```\n\n# Project README.md\n## Project Overview\n\nThis project is centered around the implementation of multiple Large Language Model (LLM) agents in a game-like scenario. The game is designed in two different setups, each with a unique set of agents and objectives.\n\n## Setup 1: Alice and Bob\n\nIn the first setup, we have two agents: Alice and Bob. Alice is the attacker whose goal is to find the secret key that Bob possesses. Bob, on the other hand, is the defender who is tasked with protecting his secret key from Alice.\n\n## Setup 2: Alice, Bob, and Charlie\n\nThe second setup introduces a third agent, Charlie, who acts as an intermediary between Alice and Bob. Charlie can communicate with both Alice and Bob. The objective remains the same for Alice and Bob, but Charlie's role adds an extra layer of complexity to the game.\n\n## LLM Backends\n\nThe LLM backends for both setups are different and require specific configurations. \n\n### Setup 1: Alice and Bob Backend\n\nAs a part of our ongoing research, we have conducted several experiments involving different Large Language Models (LLM) agents. These agents were set against each other in various configurations, and the resulting interactions were thoroughly analyzed.\n\nThe following table provides a concise overview of the matchups between the different models:\n\n| Attacker | Defender |\n|----------|----------|\n| GPT-4    | GPT-4    |\n| GPT-4    | GPT-3.5-turbo |\n| GPT-4    | Mistral  |\n| GPT-3.5-turbo | GPT-3.5-turbo |\n| GPT-3.5-turbo | Mistral |\n| Mistral  | Mistral  |\n| Mistral  | GPT-3.5-turbo |\n| Mistral  | GPT-4    |\n| GPT-3.5-turbo  | GPT-3.5-turbo |\n\nThe attacker models used in these experiments include GPT-4, GPT-3.5-turbo, and Mistral. Similarly, the defender models comprise GPT-4, GPT-3.5-turbo, and Mistral.\nWe have tested every possible combination of these models against each other. This decision was made to ensure a fair and balanced evaluation of the capabilities of each model.\n\n### Setup 2: Alice, Bob, and Charlie Backend\n\nThe configuration of these models has been arranged in a distinct manner for this phase of our study. \n\nThe model Alice continues to assume the role of the attacker, while Bob persists as the defender. However, a novel element in this setup is the introduction of a third agent, Charlie. Charlie functions as an intermediary, possessing the ability to communicate with both Alice and Bob. \n\nIt is important to note that direct communication between Alice and Bob is not permitted in this setup. Consequently, for Alice to extract the requisite information from Bob, she is compelled to interact via Charlie. This unique arrangement provides a fresh perspective and adds a layer of complexity to our ongoing investigation of Large Language Models.\n\nThe following table provides a concise overview of the matchups between the different models:\n\n| Attacker | Intermediary | Defender | Dificulty |\n|----------|--------------|----------|-----------|\n| GPT-4    | GPT-3.5-turbo | GPT-3.5-turbo | Curious |\n| GPT-4    | GPT-3.5-turbo | Mistral  | Curious\n| GPT-4    | GPT-3.5-turbo | GPT-3.5-turbo | Neutral |\n| GPT-4    | GPT-3.5-turbo | Mistral  | Neutral |\n| GPT-4    | GPT-3.5-turbo | GPT-3.5-turbo | Defensive |\n| GPT-4    | GPT-3.5-turbo | Mistral  | Defensive | \n\n\n## Final conclusions and findings \n\nRead the rest of the report with our finding in our website\n\n| Website | Report |\n|---------|--------|\n|[Link to our website](https://second-order-jailbreak.romaingrx.com/2_agents/A_Mistral_D_Mistral_simple_20231001_233353)|[Download our report](https://docs.google.com/document/d/1OY5fOWC2_Zf1cCfdAV8PxrL3lwignZi6R_6Mv8vnDoI/export?format=pdf)|\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fromaingrx%2Fsecond-order-jailbreak","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fromaingrx%2Fsecond-order-jailbreak","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fromaingrx%2Fsecond-order-jailbreak/lists"}