{"id":17989479,"url":"https://github.com/evilfreelancer/dqa-quorum","last_synced_at":"2025-06-30T07:33:54.408Z","repository":{"id":253237828,"uuid":"842816152","full_name":"EvilFreelancer/dqa-quorum","owner":"EvilFreelancer","description":null,"archived":false,"fork":false,"pushed_at":"2024-08-16T11:40:08.000Z","size":197,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-04T03:41:19.823Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/EvilFreelancer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-15T06:34:28.000Z","updated_at":"2024-08-16T11:40:12.000Z","dependencies_parsed_at":"2024-08-15T12:28:05.788Z","dependency_job_id":"00906363-7f68-48b7-bf6f-25c3d959bbdc","html_url":"https://github.com/EvilFreelancer/dqa-quorum","commit_stats":null,"previous_names":["evilfreelancer/dqa-quorum"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/EvilFreelancer/dqa-quorum","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EvilFreelancer%2Fdqa-quorum","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EvilFreelancer%2Fdqa-quorum/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EvilFreelancer%2Fdqa-quorum/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EvilFreelancer%2Fdqa-quorum/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/EvilFreelancer","download_url":"https://codeload.github.com/EvilFreelancer/dqa-quorum/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EvilFreelancer%2Fdqa-quorum/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262731774,"owners_count":23355423,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-29T19:14:47.281Z","updated_at":"2025-06-30T07:33:54.383Z","avatar_url":"https://github.com/EvilFreelancer.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Quorum of LLMs for Dataset Quality Assessment\n\nThis project is designed to assess the quality of a dataset by evaluating each sample and determining if it should be\nincluded in the cleaned version of the dataset. This project uses multiple Large Language Models (LLMs) as experts to\nrate samples based on their content and provides a summary of the average rating along with a classification as `bad`\nor `good`.\n\nThe evaluation process uses a quorum-based approach, where each sample is rated by a pool of experts and a majority vote\ndetermines its inclusion in the cleaned dataset. If a majority vote has not been achieved, the sample will be excluded\nfrom the cleaned version.\n\n## Features\n\n* Uses multiple LLMs to rate samples based on their content\n* Calculates the average rating for each sample\n* Determines if the majority vote of quorum has been achieved for each sample\n* Classifies the average rating as `bad` or `good` based on a threshold of 3.5\n\n## Architecture\n\nEach sample of dataset will be processed separately.\n\n![arch](./assets/arch.png)\n\n### Quorum of experts\n\nYou may use any LLM as an expert for your quorum; the only limitation is that the remote API should be compatible with\nthe OpenAI API client.\n\nExample of `experts.yml` configuration:\n\n```yaml\nexperts:\n  - model: gpt-3.5-turbo\n  - model: anthropic/claude-3-haiku\n  - model: perplexity/llama-3-sonar-small-32k-online\n  - model: google/palm-2-chat-bison-32k\n  - model: google/gemma-2-9b-it\n```\n\nHere you may set multiple models; they will work as experts in the quorum.\n\n### Advanced settings of experts\n\nYou may use different API keys, base URLs, and prompt templates:\n\n```yaml\nexperts:\n  - model: gpt-3.5-turbo\n    api_key: sk-XXXX\n    base_url: https://api.openai.com/v1\n    prompt_template: Evaluate how well this example conveys its meaning?\\nPlease rate text below from 1 (poor) to 5 (excellent), RESPONSE ONLY ONE NUMBER:\\n\\n{{ context }}\\n\n  - model: gpt-3.5-turbo\n    api_key: sk-YYYY\n    base_url: https://api.vsegpt.ru/v1\n```\n\n### Prompt template\n\nThe template should at least include the `{{ context }}` field.\n\n```text\nCan you evaluate how well this example conveys its meaning, how well it is organized and structured, whether it fits the theme of the conversation, and whether its responses are accurate?\nPlease rate text below from 1 (bad) to 5 (good), RESPONSE ONLY ONE NUMBER:\n\n{{ context }}\n```\n\nSee [prompt_template.txt](./prompt_template.txt) for details.\n\n### Example\n\n* [dqa.ipynb](./examples/dqa.ipynb) - standalone example with all classes and function used under the hood.\n* [dqa-simplified.ipynb](./examples/dqa-simplified.ipynb) - simplified example, it works in the same way\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](./LICENSE) file for details.\n\n## Citation\n\nIf you use this project in your research or work, please cite it as follows:\n\n```text\n[Pavel Rykov]. (2024). Quorum of LLMs for Dataset Quality Assessment. GitHub. https://github.com/EvilFreelancer/dqa-quorum\n```\n\nAlternatively, in BibTeX format:\n\n```bibtex\n@misc{pavelrykov2024dqaquorum,\n  author = {Pavel Rykov},\n  title  = {Quorum of LLMs for Dataset Quality Assessment},\n  year   = {2024},\n  url    = {https://github.com/EvilFreelancer/dqa-quorum}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fevilfreelancer%2Fdqa-quorum","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fevilfreelancer%2Fdqa-quorum","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fevilfreelancer%2Fdqa-quorum/lists"}