https://github.com/duguce/guessarena-demo
A web-based interactive demo for the GuessArena evaluation framework
https://github.com/duguce/guessarena-demo
chatgpt deepseek demo eval-framework flask guessarena large-language-models
Last synced: 4 months ago
JSON representation
A web-based interactive demo for the GuessArena evaluation framework
- Host: GitHub
- URL: https://github.com/duguce/guessarena-demo
- Owner: Duguce
- Created: 2025-03-09T13:43:13.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-11-15T09:13:33.000Z (7 months ago)
- Last Synced: 2025-11-15T11:13:34.160Z (7 months ago)
- Topics: chatgpt, deepseek, demo, eval-framework, flask, guessarena, large-language-models
- Language: HTML
- Homepage: https://aclanthology.org/2025.acl-long.534/
- Size: 37.1 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
GuessArena Demo
A web-based interactive demo for the GuessArena evaluation framework
> \[!NOTE\]
> **GuessArena Demo** is a lightweight web application that simulates a card-guessing game with both **player interaction** and **AI-versus-AI simulation**.
> It provides an intuitive, hands-on interface to explore the evaluation methodology introduced in our paper:
>
> βGuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoningβ
> .
### π Features
- **Player Mode** β Interactively play the guessing game by asking yes/no questions to an AI judge
- **AI Simulation Mode** β Observe two LLMs engaging in a self-play guessing process
- **Leaderboard Tracking** β Compare model performance across different domains and settings
- **Customizable Decks** β Use built-in card sets or define your own domain-specific decks
- **Domain-Specific Scenarios** β Evaluate reasoning in different industries and knowledge areas
### π¦ Requirements
- **Python 3.8+**
- **Flask**
- **OpenAI API access**
### π Installation
1. Clone the repository:
```bash
git clone git@github.com:Duguce/GuessArena-Demo.git
cd GuessArena-Demo
```
2. Create and activate a conda environment:
```
conda create -n guessarena python=3.10
conda activate guessarena
```
3. Install dependencies:
```
pip install -r requirements.txt
```
4. Configure your API settings in `config/settings.json`
### π
Usage
1. Set up your API keys in `config/models.ini` for the AI models you want to use.
2. Start the application:
```
python app.py
```
3. Open your browser and go to `http://localhost:8888`
4. Choose between Player Mode or AI Simulation
## π Project Structure
- `/config` - Configuration files and model settings
- `/data` - Leaderboard data, logs, and card decks
- `/prompts` - Prompt templates for AI models
- `/static` - Static assets (CSS, JavaScript)
- `/templates` - HTML templates
### π Security
The application includes several security features:
- Content Security Policy
- Rate limiting for API endpoints
- Path traversal prevention
- Secure file access
### π Citation
```
@inproceedings{
GuessArena,
title = "{G}uess{A}rena: Guess Who {I} Am? A Self-Adaptive Framework for Evaluating {LLM}s in Domain-Specific Knowledge and Reasoning",
author = "Yu, Qingchen and
Zheng, Zifan and
Chen, Ding and
Niu, Simin and
Tang, Bo and
Xiong, Feiyu and
Li, Zhiyu",
editor = "Che, Wanxiang and
Nabende, Joyce and
Shutova, Ekaterina and
Pilehvar, Mohammad Taher",
booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2025",
address = "Vienna, Austria",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.acl-long.534/",
doi = "10.18653/v1/2025.acl-long.534",
pages = "10897--10912",
ISBN = "979-8-89176-251-0",
}
```