https://github.com/mideind/icelandiceval

Utilities to generate Icelandic evaluation data sets for LLMs
https://github.com/mideind/icelandiceval

evaluation grammar icelandic inflection llm python

Last synced: about 1 year ago
JSON representation

Utilities to generate Icelandic evaluation data sets for LLMs

Host: GitHub
URL: https://github.com/mideind/icelandiceval
Owner: mideind
License: mit
Created: 2023-10-21T13:22:22.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2023-10-23T11:34:19.000Z (over 2 years ago)
Last Synced: 2025-01-26T03:08:13.323Z (over 1 year ago)
Topics: evaluation, grammar, icelandic, inflection, llm, python
Language: Python
Homepage:
Size: 1.85 MB
Stars: 1
Watchers: 7
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# IcelandicEval
A repository of utilities to generate Icelandic
evaluation data sets for LLMs. The data sets are mostly about
word inflection and grammatical correctness.

## calc-freq.py

This utility program generates evaluation data
for LLMs, typically OpenAI's GPT-4, to test proficiency
in Icelandic. The data consists of lists of noun phrases,
where each phrase contains an adjective and a noun,
and the task is to inflect the adjective and noun together
in all four cases (nominative, accusative, dative, genitive),
in singular as well as plural.

The final output of the program is a set of three JSONL
files, each containing a number of samples. The samples are
bucketed into three categories, easy, medium and hard,
depending on the frequency of the adjectives and nouns used
in each sample. Each sample is an LLM prompt and an ideal
completion.

### Usage

Clone this repo into a directory, create a virtualenv and
install the requirements:

```bash
git clone https://github.com/mideind/IcelandicEval.git
cd IcelandicEval
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```

The `nouns.csv` and `adjectives.csv` files need to be present
in the `data` directory. They were originally created
by querying the BÍN database
([bin.arnastofnun.is](https://bin.arnastofnun.is)), for example
(in the psql command line):

```bash
psql> \copy (select ord, ofl from bin2023
where ofl in ('kk', 'kvk', 'hk')) to 'data/nouns.csv' with csv;

psql> \copy (select ord from bin2023 where ofl = 'lo')
to 'data/adjectives.csv' with csv;
```

Then, given those files, this program is run to generate
randomly sampled, bucketed lists of nouns and adjectives
respectively. The buckets are created by frequency of
occurrence of the word forms in the `icegrams` database,
with bucket 0 containing the least frequent words and bucket
2 the most frequent. The bucket files are created in the `data`
directory, under the names `nouns-{0,1,2}.txt` and `adj-{0,1,2}.txt`.

```bash
python calc-freq.py --nouns
python calc-freq.py --adjectives
```

Finally, after the buckets 0-2 have been created, the
final evaluation samples can be generated. The number
of samples desired from each bucket can be passed in as
a command line parameter, defaulting to 20.

```bash
python calc-freq.py --generate [N, default 20]
```

The results are found in three JSONL files, named
`data/icelandic-inflection-{easy,medium,hard}/samples.jsonl`.
They are in a format that is suitable for use with OpenAI's
evals suite (see [github.com/openai/evals](https://github.com/openai/evals)).

# License

This software is under the MIT License. Consult the LICENSE.md file
for details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mideind/icelandiceval

Awesome Lists containing this project

README