An open API service indexing awesome lists of open source software.

https://github.com/inc44/znob

ZNOB is a multimodal benchmark measuring frontier LLMs' capabilities in passing Ukrainian national exams.
https://github.com/inc44/znob

benchmark benchmarks llm llms test testing tests zno

Last synced: 10 months ago
JSON representation

ZNOB is a multimodal benchmark measuring frontier LLMs' capabilities in passing Ukrainian national exams.

Awesome Lists containing this project

README

          

# ZNOB

ZNOB is a multimodal benchmark measuring frontier LLMs' capabilities in passing Ukrainian national exams.

## ๐Ÿš€ Installation

```bash
conda create -n znob python=3.9 -y # up to 3.13
conda activate znob
git clone https://github.com/Inc44/ZNOB.git
cd ZNOB
pip install -r requirements.txt
```

## ๐Ÿงพ Configuration

Set environment variable:

```powershell
setx /M OPENROUTER_API_KEY your_api_key
```

For Linux/macOS:

```bash
echo 'export OPENROUTER_API_KEY="your_api_key"' >> ~/.bashrc # or ~/.zshrc
```

Or create a `.env` file or modify /etc/environment:

```
OPENROUTER_API_KEY=your_api_key
```

Check by restarting the terminal and using:

```cmd
echo %OPENROUTER_API_KEY%
```

For Linux/macOS:

```bash
echo $OPENROUTER_API_KEY
```

## ๐Ÿ“– Usage Examples

### Prepare Dataset

```bash
python -m znob.cli -d your_zno_dataset -u your_zno_source
```

### Test LLM

```bash
python -m znob.cli -d your_zno_dataset --model google/gemini-2.5-flash
```

### Reset Outputs

```bash
python -m znob.cli -d your_zno_dataset -r responses,combined_responses,summary # or questions or all
```

## ๐ŸŽจ Command-Line Arguments

| Argument | Description |
|--------------------------|-------------------------------|
| `-u, --url ` | Dataset source. |
| `-d, --dataset ` | Dataset to test. |
| `-m, --model ` | AI model to test. |
| `-r, --reset ` | Reset outputs. |
| `--no-text` | Send only image, no text. |
| `--no-image` | Send only text, no image. |
| `--necessary-image-only` | Send image only if necessary. |

## ๐ŸŽฏ Motivation

LLMs have made significant progress since November 2024, so I decided to measure their progress and also verify the claims of the [Benchmarking Multimodal Models for Ukrainian Language Understanding Across Academic and Cultural Domains](https://arxiv.org/abs/2411.14647v1) research paper.

## ๐Ÿ› Bugs

Not yet found.

## โ›” Known Limitations

Not yet known.

## ๐Ÿšง TODO

Not yet planned.