https://github.com/evilfreelancer/rrr-benchmark

A benchmark for evaluating large language models (LLMs) on Russian dialogue routing tasks. Models are tested on their ability to select the correct route ID and explain their reasoning based on message history and available routes.
https://github.com/evilfreelancer/rrr-benchmark

Last synced: 10 months ago
JSON representation

Host: GitHub
URL: https://github.com/evilfreelancer/rrr-benchmark
Owner: EvilFreelancer
License: mit
Created: 2025-06-23T09:04:43.000Z (12 months ago)
Default Branch: main
Last Pushed: 2025-07-02T14:28:57.000Z (12 months ago)
Last Synced: 2025-07-02T15:37:08.361Z (12 months ago)
Language: Python
Homepage: https://huggingface.co/spaces/evilfreelancer/rrr-leaderboard
Size: 88.9 KB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/evilfreelancer/rrr-benchmark

Awesome Lists containing this project