https://github.com/inc44/znob

ZNOB is a multimodal benchmark measuring frontier LLMs' capabilities in passing Ukrainian national exams.
https://github.com/inc44/znob

benchmark benchmarks llm llms test testing tests zno

Last synced: 10 months ago
JSON representation

ZNOB is a multimodal benchmark measuring frontier LLMs' capabilities in passing Ukrainian national exams.

Host: GitHub
URL: https://github.com/inc44/znob
Owner: Inc44
Created: 2025-08-23T19:30:56.000Z (11 months ago)
Default Branch: master
Last Pushed: 2025-08-27T20:20:21.000Z (10 months ago)
Last Synced: 2025-08-27T22:25:40.239Z (10 months ago)
Topics: benchmark, benchmarks, llm, llms, test, testing, tests, zno
Language: Python
Homepage:
Size: 15.6 KB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# ZNOB

ZNOB is a multimodal benchmark measuring frontier LLMs' capabilities in passing Ukrainian national exams.

## 🚀 Installation

```bash
conda create -n znob python=3.9 -y # up to 3.13
conda activate znob
git clone https://github.com/Inc44/ZNOB.git
cd ZNOB
pip install -r requirements.txt
```

## 🧾 Configuration

Set environment variable:

```powershell
setx /M OPENROUTER_API_KEY your_api_key
```

For Linux/macOS:

```bash
echo 'export OPENROUTER_API_KEY="your_api_key"' >> ~/.bashrc # or ~/.zshrc
```

Or create a `.env` file or modify /etc/environment:

```
OPENROUTER_API_KEY=your_api_key
```

Check by restarting the terminal and using:

```cmd
echo %OPENROUTER_API_KEY%
```

For Linux/macOS:

```bash
echo $OPENROUTER_API_KEY
```

## 📖 Usage Examples

### Prepare Dataset

```bash
python -m znob.cli -d your_zno_dataset -u your_zno_source
```

### Test LLM

```bash
python -m znob.cli -d your_zno_dataset --model google/gemini-2.5-flash
```

### Reset Outputs

```bash
python -m znob.cli -d your_zno_dataset -r responses,combined_responses,summary # or questions or all
```

## 🎨 Command-Line Arguments

| Argument | Description |
|--------------------------|-------------------------------|
| `-u, --url ` | Dataset source. |
| `-d, --dataset ` | Dataset to test. |
| `-m, --model ` | AI model to test. |
| `-r, --reset ` | Reset outputs. |
| `--no-text` | Send only image, no text. |
| `--no-image` | Send only text, no image. |
| `--necessary-image-only` | Send image only if necessary. |

## 🎯 Motivation

LLMs have made significant progress since November 2024, so I decided to measure their progress and also verify the claims of the [Benchmarking Multimodal Models for Ukrainian Language Understanding Across Academic and Cultural Domains](https://arxiv.org/abs/2411.14647v1) research paper.

## 🐛 Bugs

Not yet found.

## ⛔ Known Limitations

Not yet known.

## 🚧 TODO

Not yet planned.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/inc44/znob

Awesome Lists containing this project

README