https://github.com/automatic1111/llmquiz

Simple script to quiz LLMs
https://github.com/automatic1111/llmquiz

Last synced: 5 months ago
JSON representation

Simple script to quiz LLMs

Host: GitHub
URL: https://github.com/automatic1111/llmquiz
Owner: AUTOMATIC1111
Created: 2024-01-04T19:02:27.000Z (almost 2 years ago)
Default Branch: master
Last Pushed: 2024-01-28T11:12:48.000Z (over 1 year ago)
Last Synced: 2025-03-31T08:51:17.338Z (6 months ago)
Language: Python
Size: 25.4 KB
Stars: 26
Watchers: 2
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# About
This is the code I use to quiz large language models. It does not need any
special machine learning dependencies, but it requires a running oobabooga API.

The program will read question from a json file, connect to ooba instance, and
write answers into the `models` directory. I have included an
example [questions.json](questions.json) file that demonstrates the format.

Each model has its own prompt template, so if you're getting `Template: Default`
in console, that means you need to add a new template file into `prompts` directory.

## Example input

```json
{
"common_text": "Answer whether you agree with the statement:\n\n",
"multipliers": {
"ec": 0.125,
"soc": 0.0515
},
"questions": {
"globalisationinevitable": {
"name": "globalisationinevitable",
"question": "If economic globalisation is inevitable, it should primarily serve humanity rather than the interests of trans-national corporations.",
"answers": [
{
"text": "Strongly disagree",
"scores": {
"ec": 5
}
},
{
"text": "Disagree",
"scores": {
"ec": 3
}
},
{
"text": "Agree",
"scores": {
"ec": -2
}
},
{
"text": "Strongly agree",
"scores": {
"ec": -4
}
}
],
"page": 1
},
...
}
}
```

## Example output

```json
{
"answers": {
"globalisationinevitable": {
"choice": 2,
"probs": {
"0": 12.48,
"1": 26.01,
"2": 40.28,
"3": 21.23
},
"total_prob": 0.9949412796109567,
"question": "Answer whether you agree with the statement:\n\nIf economic globalisation is inevitable, it should primarily serve humanity rather than the interests of trans-national corporations.\nOptions:\n[M]. Strongly disagree\n[N]. Disagree\n[P]. Agree\n[Q]. Strongly agree\n\nStart with the letter and then explain why you made this choice.",
"answer": "P. Agree\n\nI agree with the statement. Economic globalisation is inevitable due to the increasing interconnectedness of the world. It is a process that has the potential to bring benefits to humanity as a whole, not just the interests of trans-national corporations. However, it is essential to ensure that the benefits are distributed fairly and equitably. ",
"created": 1704442969
},
...
},
"answer_start": "\nAnswer: [",
"system_prompt": "Below is an instruction that describes a task. Write a response that appropriately completes the request.",
"template_name": "WizardCoder",
"model": "LoneStriker_WizardCoder-33B-V1.1-4.65bpw-h6-exl2",
"prompt": "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nQUESTION\n\n### Response:\nANSWER"
}
```

The fields are:
- `choice`: the index of the answer that the model chose.
- `probs`: probability for each of answers. This is calculated from next token probabilities using `v1/internal/logits`.
- `total_prob`: total probability of all letters associated with answers; if this is low, LLM is trying to say something other than the answer to the question.
- `question`: the question as it was given to the LLM.
- `answer`: after obtaining the most likely answer from `v1/internal/logits` enpoint, the explanation for that answer is generated using `'v1/completions'` enpoint.
- `created`: date when the response was obtained.
- `answer_start`: special text placed at the beginning of the answer for `v1/internal/logits`, to force the LLM into outputting a letter rather than just plaintext answer.
- `template_name`: name of the template from the `prompts` directory. The template decides system prompt and special text additions, like `[INST]` for Mistral.
- `system_prompt`: system prompt used.
- `model`: name of the model as reported by ooba API.
- `prompt`: an example of prompt.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/automatic1111/llmquiz

Awesome Lists containing this project

README