{"id":26281228,"url":"https://github.com/duguce/guessarena-demo","last_synced_at":"2026-03-01T04:34:01.990Z","repository":{"id":281511027,"uuid":"945464227","full_name":"Duguce/GuessArena-Demo","owner":"Duguce","description":"A web-based interactive demo for the GuessArena evaluation framework","archived":false,"fork":false,"pushed_at":"2025-11-15T09:13:33.000Z","size":38,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-11-15T11:13:34.160Z","etag":null,"topics":["chatgpt","deepseek","demo","eval-framework","flask","guessarena","large-language-models"],"latest_commit_sha":null,"homepage":"https://aclanthology.org/2025.acl-long.534/","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Duguce.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-03-09T13:43:13.000Z","updated_at":"2025-11-15T09:14:24.000Z","dependencies_parsed_at":"2025-03-09T16:25:18.066Z","dependency_job_id":"ca8db4b8-e204-412d-bb6b-2f6edc323486","html_url":"https://github.com/Duguce/GuessArena-Demo","commit_stats":null,"previous_names":["duguce/guessarena-demo"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Duguce/GuessArena-Demo","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Duguce%2FGuessArena-Demo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Duguce%2FGuessArena-Demo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Duguce%2FGuessArena-Demo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Duguce%2FGuessArena-Demo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Duguce","download_url":"https://codeload.github.com/Duguce/GuessArena-Demo/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Duguce%2FGuessArena-Demo/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29960253,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-01T01:47:18.291Z","status":"online","status_checked_at":"2026-03-01T02:00:07.437Z","response_time":124,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chatgpt","deepseek","demo","eval-framework","flask","guessarena","large-language-models"],"created_at":"2025-03-14T15:20:00.041Z","updated_at":"2026-03-01T04:34:01.979Z","avatar_url":"https://github.com/Duguce.png","language":"HTML","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch2 align=\"center\"\u003eGuessArena Demo\u003c/h2\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cem\u003eA web-based interactive demo for the GuessArena evaluation framework\u003c/em\u003e\n\u003c/p\u003e\n\n\u003e \\[!NOTE\\]  \n\u003e **GuessArena Demo** is a lightweight web application that simulates a card-guessing game with both **player interaction** and **AI-versus-AI simulation**.  \n\u003e It provides an intuitive, hands-on interface to explore the evaluation methodology introduced in our paper:  \n\u003e \u003ca href=\"https://aclanthology.org/2025.acl-long.534/\"\u003e\n\u003e \u003cem\u003e“GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning”\u003c/em\u003e\n\u003e \u003c/a\u003e.\n\n### 🚀 Features\n\n- **Player Mode** – Interactively play the guessing game by asking yes/no questions to an AI judge  \n- **AI Simulation Mode** – Observe two LLMs engaging in a self-play guessing process  \n- **Leaderboard Tracking** – Compare model performance across different domains and settings  \n- **Customizable Decks** – Use built-in card sets or define your own domain-specific decks  \n- **Domain-Specific Scenarios** – Evaluate reasoning in different industries and knowledge areas  \n\n### 📦 Requirements\n\n- **Python 3.8+**  \n- **Flask**  \n- **OpenAI API access**  \n\n### 🛠 Installation\n\n1. Clone the repository:\n   ```bash\n   git clone git@github.com:Duguce/GuessArena-Demo.git\n   cd GuessArena-Demo\n   ```\n\n2. Create and activate a conda environment:\n   ```\n   conda create -n guessarena python=3.10\n   conda activate guessarena\n   ```\n\n3. Install dependencies:\n   ```\n   pip install -r requirements.txt\n   ```\n\n4. Configure your API settings in `config/settings.json`\n\n### 🌅 Usage\n\n1. Set up your API keys in `config/models.ini` for the AI models you want to use.\n\n2. Start the application:\n   ```\n   python app.py\n   ```\n\n3. Open your browser and go to `http://localhost:8888`\n\n4. Choose between Player Mode or AI Simulation\n\n## 📁 Project Structure\n\n- `/config` - Configuration files and model settings\n- `/data` - Leaderboard data, logs, and card decks\n- `/prompts` - Prompt templates for AI models\n- `/static` - Static assets (CSS, JavaScript)\n- `/templates` - HTML templates\n\n### 🔒 Security\n\nThe application includes several security features:\n- Content Security Policy\n- Rate limiting for API endpoints\n- Path traversal prevention\n- Secure file access\n\n### 📖 Citation\n\n```\n@inproceedings{\n    GuessArena,\n    title = \"{G}uess{A}rena: Guess Who {I} Am? A Self-Adaptive Framework for Evaluating {LLM}s in Domain-Specific Knowledge and Reasoning\",\n    author = \"Yu, Qingchen  and\n      Zheng, Zifan  and\n      Chen, Ding  and\n      Niu, Simin  and\n      Tang, Bo  and\n      Xiong, Feiyu  and\n      Li, Zhiyu\",\n    editor = \"Che, Wanxiang  and\n      Nabende, Joyce  and\n      Shutova, Ekaterina  and\n      Pilehvar, Mohammad Taher\",\n    booktitle = \"Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)\",\n    month = jul,\n    year = \"2025\",\n    address = \"Vienna, Austria\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://aclanthology.org/2025.acl-long.534/\",\n    doi = \"10.18653/v1/2025.acl-long.534\",\n    pages = \"10897--10912\",\n    ISBN = \"979-8-89176-251-0\",\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fduguce%2Fguessarena-demo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fduguce%2Fguessarena-demo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fduguce%2Fguessarena-demo/lists"}