https://github.com/evilfreelancer/rrr-benchmark
A benchmark for evaluating large language models (LLMs) on Russian dialogue routing tasks. Models are tested on their ability to select the correct route ID and explain their reasoning based on message history and available routes.
https://github.com/evilfreelancer/rrr-benchmark
Last synced: 4 months ago
JSON representation
A benchmark for evaluating large language models (LLMs) on Russian dialogue routing tasks. Models are tested on their ability to select the correct route ID and explain their reasoning based on message history and available routes.
- Host: GitHub
- URL: https://github.com/evilfreelancer/rrr-benchmark
- Owner: EvilFreelancer
- License: mit
- Created: 2025-06-23T09:04:43.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-07-02T14:28:57.000Z (6 months ago)
- Last Synced: 2025-07-02T15:37:08.361Z (6 months ago)
- Language: Python
- Homepage: https://huggingface.co/spaces/evilfreelancer/rrr-leaderboard
- Size: 88.9 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE