https://github.com/evilfreelancer/dqa-quorum
https://github.com/evilfreelancer/dqa-quorum
Last synced: 7 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/evilfreelancer/dqa-quorum
- Owner: EvilFreelancer
- License: mit
- Created: 2024-08-15T06:34:28.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-08-16T11:40:08.000Z (over 1 year ago)
- Last Synced: 2025-04-04T03:41:19.823Z (9 months ago)
- Language: Python
- Size: 192 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Quorum of LLMs for Dataset Quality Assessment
This project is designed to assess the quality of a dataset by evaluating each sample and determining if it should be
included in the cleaned version of the dataset. This project uses multiple Large Language Models (LLMs) as experts to
rate samples based on their content and provides a summary of the average rating along with a classification as `bad`
or `good`.
The evaluation process uses a quorum-based approach, where each sample is rated by a pool of experts and a majority vote
determines its inclusion in the cleaned dataset. If a majority vote has not been achieved, the sample will be excluded
from the cleaned version.
## Features
* Uses multiple LLMs to rate samples based on their content
* Calculates the average rating for each sample
* Determines if the majority vote of quorum has been achieved for each sample
* Classifies the average rating as `bad` or `good` based on a threshold of 3.5
## Architecture
Each sample of dataset will be processed separately.

### Quorum of experts
You may use any LLM as an expert for your quorum; the only limitation is that the remote API should be compatible with
the OpenAI API client.
Example of `experts.yml` configuration:
```yaml
experts:
- model: gpt-3.5-turbo
- model: anthropic/claude-3-haiku
- model: perplexity/llama-3-sonar-small-32k-online
- model: google/palm-2-chat-bison-32k
- model: google/gemma-2-9b-it
```
Here you may set multiple models; they will work as experts in the quorum.
### Advanced settings of experts
You may use different API keys, base URLs, and prompt templates:
```yaml
experts:
- model: gpt-3.5-turbo
api_key: sk-XXXX
base_url: https://api.openai.com/v1
prompt_template: Evaluate how well this example conveys its meaning?\nPlease rate text below from 1 (poor) to 5 (excellent), RESPONSE ONLY ONE NUMBER:\n\n{{ context }}\n
- model: gpt-3.5-turbo
api_key: sk-YYYY
base_url: https://api.vsegpt.ru/v1
```
### Prompt template
The template should at least include the `{{ context }}` field.
```text
Can you evaluate how well this example conveys its meaning, how well it is organized and structured, whether it fits the theme of the conversation, and whether its responses are accurate?
Please rate text below from 1 (bad) to 5 (good), RESPONSE ONLY ONE NUMBER:
{{ context }}
```
See [prompt_template.txt](./prompt_template.txt) for details.
### Example
* [dqa.ipynb](./examples/dqa.ipynb) - standalone example with all classes and function used under the hood.
* [dqa-simplified.ipynb](./examples/dqa-simplified.ipynb) - simplified example, it works in the same way
## License
This project is licensed under the MIT License. See the [LICENSE](./LICENSE) file for details.
## Citation
If you use this project in your research or work, please cite it as follows:
```text
[Pavel Rykov]. (2024). Quorum of LLMs for Dataset Quality Assessment. GitHub. https://github.com/EvilFreelancer/dqa-quorum
```
Alternatively, in BibTeX format:
```bibtex
@misc{pavelrykov2024dqaquorum,
author = {Pavel Rykov},
title = {Quorum of LLMs for Dataset Quality Assessment},
year = {2024},
url = {https://github.com/EvilFreelancer/dqa-quorum}
}
```