https://github.com/inc44/znob
ZNOB is a multimodal benchmark measuring frontier LLMs' capabilities in passing Ukrainian national exams.
https://github.com/inc44/znob
benchmark benchmarks llm llms test testing tests zno
Last synced: 10 months ago
JSON representation
ZNOB is a multimodal benchmark measuring frontier LLMs' capabilities in passing Ukrainian national exams.
- Host: GitHub
- URL: https://github.com/inc44/znob
- Owner: Inc44
- Created: 2025-08-23T19:30:56.000Z (10 months ago)
- Default Branch: master
- Last Pushed: 2025-08-27T20:20:21.000Z (10 months ago)
- Last Synced: 2025-08-27T22:25:40.239Z (10 months ago)
- Topics: benchmark, benchmarks, llm, llms, test, testing, tests, zno
- Language: Python
- Homepage:
- Size: 15.6 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ZNOB
ZNOB is a multimodal benchmark measuring frontier LLMs' capabilities in passing Ukrainian national exams.
## ๐ Installation
```bash
conda create -n znob python=3.9 -y # up to 3.13
conda activate znob
git clone https://github.com/Inc44/ZNOB.git
cd ZNOB
pip install -r requirements.txt
```
## ๐งพ Configuration
Set environment variable:
```powershell
setx /M OPENROUTER_API_KEY your_api_key
```
For Linux/macOS:
```bash
echo 'export OPENROUTER_API_KEY="your_api_key"' >> ~/.bashrc # or ~/.zshrc
```
Or create a `.env` file or modify /etc/environment:
```
OPENROUTER_API_KEY=your_api_key
```
Check by restarting the terminal and using:
```cmd
echo %OPENROUTER_API_KEY%
```
For Linux/macOS:
```bash
echo $OPENROUTER_API_KEY
```
## ๐ Usage Examples
### Prepare Dataset
```bash
python -m znob.cli -d your_zno_dataset -u your_zno_source
```
### Test LLM
```bash
python -m znob.cli -d your_zno_dataset --model google/gemini-2.5-flash
```
### Reset Outputs
```bash
python -m znob.cli -d your_zno_dataset -r responses,combined_responses,summary # or questions or all
```
## ๐จ Command-Line Arguments
| Argument | Description |
|--------------------------|-------------------------------|
| `-u, --url ` | Dataset source. |
| `-d, --dataset ` | Dataset to test. |
| `-m, --model ` | AI model to test. |
| `-r, --reset ` | Reset outputs. |
| `--no-text` | Send only image, no text. |
| `--no-image` | Send only text, no image. |
| `--necessary-image-only` | Send image only if necessary. |
## ๐ฏ Motivation
LLMs have made significant progress since November 2024, so I decided to measure their progress and also verify the claims of the [Benchmarking Multimodal Models for Ukrainian Language Understanding Across Academic and Cultural Domains](https://arxiv.org/abs/2411.14647v1) research paper.
## ๐ Bugs
Not yet found.
## โ Known Limitations
Not yet known.
## ๐ง TODO
Not yet planned.