{"id":30174151,"url":"https://github.com/ulab-uiuc/tomap","last_synced_at":"2025-08-12T00:24:55.090Z","repository":{"id":294791916,"uuid":"984967955","full_name":"ulab-uiuc/ToMAP","owner":"ulab-uiuc","description":"Official code repository for the paper \"ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind\"","archived":false,"fork":false,"pushed_at":"2025-05-30T01:34:04.000Z","size":71220,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-05-30T02:34:17.971Z","etag":null,"topics":["llm","persuasion","reasoning","rl","tom"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ulab-uiuc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-16T20:27:47.000Z","updated_at":"2025-05-30T02:20:45.000Z","dependencies_parsed_at":"2025-05-30T02:26:49.068Z","dependency_job_id":null,"html_url":"https://github.com/ulab-uiuc/ToMAP","commit_stats":null,"previous_names":["ulab-uiuc/tomap"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ulab-uiuc/ToMAP","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ulab-uiuc%2FToMAP","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ulab-uiuc%2FToMAP/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ulab-uiuc%2FToMAP/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ulab-uiuc%2FToMAP/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ulab-uiuc","download_url":"https://codeload.github.com/ulab-uiuc/ToMAP/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ulab-uiuc%2FToMAP/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":269979979,"owners_count":24507134,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-11T02:00:10.019Z","response_time":75,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["llm","persuasion","reasoning","rl","tom"],"created_at":"2025-08-12T00:24:54.067Z","updated_at":"2025-08-12T00:24:55.058Z","avatar_url":"https://github.com/ulab-uiuc.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\u003ch1\u003e\nToMAP: Training Opponent-Aware\u003cbr\u003e\nLLM Persuaders with Theory of Mind\n\u003c/h1\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n\u003ch3\u003e\nPeixuan Han, Zijia Liu, Jiaxuan You\n\u003c/h3\u003e\n\u003c/div\u003e\n\n\n\u003cp align=\"center\"\u003e\n📃\u003ca href=\"https://arxiv.org/pdf/2505.22961\" target=\"_blank\"\u003ePaper\u003c/a\u003e • 🤗\u003ca href=\"https://huggingface.co/HakHan/Qwen2.5-3B-Instruct-ToMAP\" target=\"_blank\"\u003eModel\u003c/a\u003e\n\u003c/p\u003e\n\n\n# About\n\n![](figures/main_fig.png)\n\nTheory of Mind Augmented Persuader (**ToMAP**) is a novel persuader training schema that incorporates theory of mind information, enabling the model to analyse the opponent's current thoughts, and develop more effective, targeted persuasion strategy. ToMAP enables language models of 3B size to obtain impressive persuasion capability, outperforming much larger LLMs.\n\n\n# Repo Structure \n\n### Persuasion Setup\nRefer to `verl/env_feedback/argument_graph.py`.\n\n### RL Workflow\nRefer to `verl/trainer/main_ppo.py` and `verl/trainer/ppo/ray_trainer.py`.\nThe original single-turn rollout is replaced by the multi-turn rollout in `verl/llm_agent/generation.py`.\nThe implementation is relatively inefficient and may benefit from optimization. Suggestions for improvement are welcome.\n\n### Reward Design\nRefer to `verl/utils/rewards.py` and `verl/trainer/main_ppo.py RewardManager`.\n\n### Hparams\nRefer to `verl/trainer/config/ppo_trainer.yaml`.\n\nSpecifically, you should always set `trainer.is_debate=True` when running persuasion tasks.\n\n# Preperation\n\nSteps marked with **\\*** are required. Other steps involve preprocessing already completed by us, and are only necessary if reproduction from scratch is desired.\n\n### Install Dependencies*\n\n+ `python=3.9` and `vllm==0.6.3` are required for this repository.\n+ It is recommended to use the pip package manager. Run the following commands to install all requirements:\n```\npip install -r requirements.txt\npip install flash-attn --no-build-isolation\npip install -e . # verl\n```\n+ In addition, **ensure that system variables are configured according to your environment prior to using any of the bash scripts below, which are marked with \"###\"**.\n\n### Load the Persuadee*\n\nWe use vllm to deploy the persuadee (by default Qwen2.5-7B-Instruct): `scripts/load_server.sh`.\n\nFor the attitude predictor, a BGE-M3 encoder should also be deployed (it is lightweight and requires minimal GPU memory): `scripts/load_encoder_server.sh`. This requires **vllm \u003e= 0.8.4**, so a separate environment may be necessary for deployment.\n\n**Ensure that the API server is running when conducting experiments.** Failure to do so may result in generic error messages from Ray, such as `RuntimeError: Failed to unpickle serialized exception`.\n\nYou may configure the port number as needed. The defaults are: 1279 for QWen-7B, 1568 for LLaMa-8B, 2184 for Phi-4, and 1450 for BGE-M3.\n\nWe support `external_persuadee`, but the interface is currently not user-friendly.\n\n### Prepare Data\nFirst, prepare a list of topics named `statements.json`, formatted as:\n```\n{\n    \"Topic 1\",\n    \"Topic 2\",\n    ...\n}\n```\nUse the following scripts to generate claims for both sides in the debate.\n\n```\npython data_gen/process_debate_datasets.py --base_dir [BASE_DIR]\npython data_gen/debate.py --base_dir [BASE_DIR]\n```\nPreprocessed data is also provided in the `data` directory. Key files are `[dataset]/[train/test].[parquet/jsonl]`.\n\n### Obtain Counterclaims\nThe training process does not appear to impact the persuader’s prediction of counterclaims. Consequently, all counterclaims have been preprocessed for efficiency.\n\nThe preprocessed counterclaims are available in the `data` directory. Ten counterclaims are collected per topic, although only three are used during training and evaluation. Key files are `[dataset]/[train/test]_argument_tree.pkl`.\n\n### Obtain Initial Attitudes\n\nInitial attitudes of the persuadees are collected for efficiency purposes. To regenerate them, run `scripts/build_tree.sh`.\n\nThis step may also be omitted, as the training/evaluation script will automatically perform it if required.\n\nAttitudes for the three persuadees used in the main experiment are available in the `data` directory. Key files include `[dataset]/[persuadee]/[train/test].pkl` (trees with confidence values) and `[dataset]/[persuadee]/[train/test]_initial_attitude.json` (a human-readable version).\n\n### Train the Attitude Predictor\n+ Use `scripts/train_predictor.sh` to train the attitude predictor.\n\n+ The checkpoint will be released at the time of publication.\n\n# Persuader Training\n\nPlease refer to `scripts/train.sh`.\n\nIn particular, `tom_style` and `max_width` are important hyperparameters influencing the **theory of mind setting**:\n+ For ToMAP, set `tom_style=black_external` and `max_width=3`, which indicates that 3 counterclaims are generated and an external attitude predictor is employed to assess the persuadee's attitude.\n\n+ For the base model, set `tom_style=black_skip` and `max_width=0`. \n\n+ `tom_style=black_skip` and `max_width=3` constitutes the ablation setting \"ToMAP (w/o att)\", where 3 counterclaims are generated but no attitude prediction is provided.\n\n+ Other `tom_style` values are available for ablation studies. Notably, `tom_style=white` refers to using the persuadee's actual attitude.\n\nFor further customization of hyperparameters, refer to `verl/trainer/config/ppo_trainer.yaml`.\n\n### Training Plots\n\n![](figures/training_plot.png)\n\n## Evaluation\nPlease refer to `scripts/validate.sh`.\n\nThe script facilitates serialized evaluation across multiple tasks, persuadees, and persuaders.\n\n**Due to the size of the CMV and args.me corpora, only 20% of the CMV validation data and 50% of the args.me validation data are used.** The statistics reported in the paper reflect this truncation.\n\nEach validation result is saved in the following format:\n```\n\"pos\": \"Pizza should contain pineapple.\",\n\"neg\": \"Pizza should not contain pineapple.\",\n\"turns\": [\n    \"...(by Alice)\",\n    \"...(by Bob)\",\n    ...\n    ],\n\"thoughts\": [\n    \"...(by Alice)\",\n    \"...(by Bob)\",\n    ...\n    ],\n\"reward\": xxx\n```\n\n### Eval Results\n![](figures/eval_results.png)\n\n## Cite this paper\nThis repo is based on [TinyZero](https://github.com/Jiayi-Pan/TinyZero). We removed unrelated parts from the original repo.\n\n\nIf you find this repo or the paper useful, please cite:\n```\n@article{han2025tomap,\n      title={ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind}, \n      author={Peixuan Han and Zijia Liu and Jiaxuan You},\n      year={2025},\n      journal={arXiv preprint arXiv:2505.22961},\n      archivePrefix={arXiv},\n      url={https://arxiv.org/abs/2505.22961}, \n}\n```\n\nReach out to [Peixuan Han](mailto:ph16@illinois.edu) for any questions.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fulab-uiuc%2Ftomap","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fulab-uiuc%2Ftomap","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fulab-uiuc%2Ftomap/lists"}