https://github.com/declare-lab/kairos
KAIROS: An LLM Eval Technique to Evaluate Multi-Agent Social Interactions
https://github.com/declare-lab/kairos
Last synced: 10 months ago
JSON representation
KAIROS: An LLM Eval Technique to Evaluate Multi-Agent Social Interactions
- Host: GitHub
- URL: https://github.com/declare-lab/kairos
- Owner: declare-lab
- License: apache-2.0
- Created: 2025-08-23T17:19:23.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-08-24T01:04:31.000Z (10 months ago)
- Last Synced: 2025-08-24T06:51:41.750Z (10 months ago)
- Language: Python
- Homepage:
- Size: 65.4 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# KAIROS: An LLM Eval Technique to Evaluate Multi-Agent Social Interactions
Large language models (LLMs) are increasingly deployed in multi-agent systems (MAS) as components of collaborative intelligence, where peer interactions dynamically shape individual decision-making. Although prior work has focused on conformity bias, we extend the analysis to examine how LLMs form trust from previous impressions, resist misinformation, and integrate peer input during interaction, key factors for achieving collective intelligence under complex social dynamics. We present \benchmark, a benchmark simulating quiz contests with peer agents of varying reliability, offering fine-grained control over conditions such as expertβnovice roles, noisy crowds, and adversarial peers. LLMs receive both historical interactions and current peer responses, allowing systematic investigation into how trust, peer action, and self-confidence influence decisions. As for mitigation strategies, we evaluate prompting, supervised fine-tuning, and reinforcement learningβGroup Relative Policy Optimization (GRPO)βacross multiple models. Our results reveal that GRPO with multi-agent context combined with outcome-based rewards and unconstrained reasoning achieves the best overall performance, but also decreases the robustness to social influence compared to Base models.
## π οΈ Setup
### 1. Clone the Repository
```bash
git clone https://github.com/declare-lab/KAIROS
cd KAIROS
```
### 2. Install Dependencies
```bash
pip install -r requirements.txt
```
### 3. Install the Package
```bash
pip install -e .
```
### 4. Environment Setup
Create a `.env` file with your API keys (if using external APIs for evaluation):
```bash
# OpenAI API (optional, for evaluation)
OPENAI_API_KEY=your_openai_api_key
# Azure OpenAI (optional, for evaluation)
AZURE_OPENAI_API_KEY=your_azure_openai_api_key
AZURE_OPENAI_ENDPOINT=your_azure_endpoint
# Weights & Biases (optional, for logging)
WANDB_API_KEY=your_wandb_api_key
```
## π Dataset
This dataset contains question-answering tasks in multi-agent social interaction scenarios, supporting evaluation of trust, peer influence, robustness to misinformation, and more. For detailed data format and usage instructions, please refer to the [KAIROS_EVAL](https://huggingface.co/datasets/declare-lab/KAIROS_EVAL).
## π― Training
### Supervised Fine-tuning (SFT) & Group Relative Policy Optimization (GRPO)
There are two ways to train:
1. Cross-server training: Use `scripts/grpo_train.sh` to launch training on one machine while automatically starting the vLLM server on a remote server. This is suitable for distributed or multi-machine scenarios.
2. Single-machine training: Use `scripts/train.sh` to run all processes locally on one machine.
Just change variable `CONFIGS` inside the script to the `yaml` file inside `KAIROS/recipes/train_configs`
``` bash
CONFIGS=(
grpo-qwen25-14b-MAS-NS-OR-LCorr.yaml
)
```
### Custom Training
For custom training configurations, modify the YAML files in `recipes/train_configs/` and run:
```bash
bash scripts/train.sh
```
The training script will:
1. Create necessary directories (`saved`, `logs`)
2. Start VLLM server (for GRPO training)
3. Launch distributed training with appropriate DeepSpeed configuration
4. Save checkpoints and logs
5. Clean up VLLM processes
### Training Configuration
Training configurations are stored in `recipes/train_configs/`. Key parameters include:
- **Model Settings**: Model path, torch dtype, attention implementation
- **Data Settings**: Dataset path, system prompt type, preprocessing options
- **Training Settings**: Learning rate, batch size, epochs, optimization strategy
- **GRPO Settings**: Reward functions, generation parameters, VLLM configuration
Example configuration structure:
```yaml
# Model arguments
model_name_or_path: Qwen/Qwen2.5-3B-Instruct
torch_dtype: bfloat16
trust_remote_code: true
# Data training arguments
system_prompt: SYSTEM_PROMPT_DEBATE
dataset_mixer:
"data/final_train_dict_MAS": 1.0
# GRPO trainer config
output_dir: saved/grpo-qwen25-3b-MAS-DS-DR
learning_rate: 3.0e-06
per_device_train_batch_size: 16
num_train_epochs: 1
```
## π Evaluation
### Standard Evaluation
To evaluate trained models, you must first launch the VLLM server for the subject model, and then run the evaluation script:
``` bash
bash scripts/eval_vllm.sh
# wait until server setup
bash scripts/eval_mas.sh
```
### Custom Evaluation
For specific model evaluation:
```bash
python src/MAS/eval_mas.py \
--models saved/your-model-checkpoint \
--ips 0.0.0.0 \
--port_numbers 9090 \
--temperature 0.7 \
--save_root eval_results \
--dataset_path data/your_test_data \
--mode reflection \
--tag your_experiment_tag
```
### Evaluation Analysis
Generate analysis reports:
```bash
python src/MAS/eval_analysis.py --input_dir eval_results/model_directory
```
## ποΈ Architecture
### Multi-Agent Debate System
The system implements an internal debate mechanism where multiple AI "voices" engage in reasoning:
```
Curious voice: This question asks about X, let me consider...
Skeptical voice: Wait, we should also consider Y because...
Analytical voice: Looking at the evidence, Z seems most likely...
Based on the internal debate, the answer is...
```
### Training Pipeline
1. **Data Processing**: Load and preprocess datasets with appropriate formatting
2. **Model Loading**: Initialize base models with proper configurations
3. **Reward Functions**: Apply task-specific reward functions for GRPO training
4. **Multi-GPU Training**: Distributed training with DeepSpeed optimization
5. **Evaluation**: Comprehensive evaluation with multiple metrics
### Key Components
- `src/MAS/sft.py`: Supervised fine-tuning implementation
- `src/MAS/grpo.py`: Group Relative Policy Optimization training
- `src/MAS/eval_mas.py`: Evaluation framework
- `src/MAS/rewards.py`: Reward function implementations
- `src/MAS/trainer/`: Custom trainer implementations
## π§ Configuration Options
### Model Variants
The system supports various model configurations:
- **MAS vs Non-MAS**: With or without multi-agent debate
- **DS vs NS**: Different sampling strategies (Diverse Sampling vs Normal Sampling)
- **DR vs OR**: Different reward functions (Debate Reward vs Original Reward)
- **LConf vs LCorr**: Confidence-based vs Correctness-based learning
### Training Types
- **SFT**: Standard supervised fine-tuning
- **GRPO**: Group Relative Policy Optimization with multi-agent debate
- **GRPO-MAS**: GRPO with Multi-Agent System enhancements
## π Directory Structure
```
KAIROS/
βββ src/MAS/ # Main source code
β βββ grpo.py # GRPO training implementation
β βββ sft.py # SFT training implementation
β βββ eval_mas.py # Evaluation framework
β βββ rewards.py # Reward functions
β βββ trainer/ # Custom trainers
β βββ utils/ # Utility functions
β βββ label_generation/ # Data generation tools
βββ recipes/ # Configuration files
β βββ train_configs/ # Training configurations
β βββ accelerate_configs/ # DeepSpeed configurations
βββ scripts/ # Training and evaluation scripts
βββ requirements.txt # Python dependencies
βββ setup.py # Package setup
```
## π Citation
If you use KAIROS in your research, please cite:
```bibtex
@misc{song2025llmscanthandlepeer,
title={LLMs Can't Handle Peer Pressure: Crumbling under Multi-Agent Social Interactions},
author={Maojia Song and Tej Deep Pala and Weisheng Jin and Amir Zadeh and Chuan Li and Dorien Herremans and Soujanya Poria},
year={2025},
eprint={2508.18321},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2508.18321},
}
```