{"id":15175559,"url":"https://github.com/spenceryonce/llmeval","last_synced_at":"2025-10-06T10:32:09.680Z","repository":{"id":156563121,"uuid":"626942480","full_name":"spenceryonce/LLMeval","owner":"spenceryonce","description":"Evaluate and compare large language models (LLMs) for chatbot applications, using various LLMs as evaluators, and manage prompt templates and binary preferences.","archived":false,"fork":false,"pushed_at":"2023-07-31T17:49:06.000Z","size":12,"stargazers_count":8,"open_issues_count":2,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-24T23:57:00.518Z","etag":null,"topics":["anthropic","chatgpt","claude","cohere","evaluation","evaluator","llm","openai"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/spenceryonce.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-04-12T13:19:45.000Z","updated_at":"2024-03-18T15:48:58.000Z","dependencies_parsed_at":null,"dependency_job_id":"7a272ff1-f0c5-480d-87a4-4884a5dd4ad7","html_url":"https://github.com/spenceryonce/LLMeval","commit_stats":{"total_commits":4,"total_committers":2,"mean_commits":2.0,"dds":0.25,"last_synced_commit":"7584257ad1b222dfd686d20cd6b4a79dbf69a9c7"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spenceryonce%2FLLMeval","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spenceryonce%2FLLMeval/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spenceryonce%2FLLMeval/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/spenceryonce%2FLLMeval/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/spenceryonce","download_url":"https://codeload.github.com/spenceryonce/LLMeval/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":235519884,"owners_count":19003201,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anthropic","chatgpt","claude","cohere","evaluation","evaluator","llm","openai"],"created_at":"2024-09-27T12:39:28.485Z","updated_at":"2025-10-06T10:32:09.674Z","avatar_url":"https://github.com/spenceryonce.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/spenceryonce/LLMeval)\n# LLM Evaluation\nThis library provides a collection of classes and functions to evaluate and compare different large language models (LLMs). The main purpose of the library is to build chatbots and evaluate their responses based on given objectives.\n\n## Modules and Classes\n1. **LanguageModelWrapper**\nA base class for wrapping different language models.\n\n2. **Prompt**\nA class for managing prompt templates.\n\n3. **BinaryPreference**\nA class for managing binary preferences between two different responses.\n\n4. **BinaryEvaluator**\nA base class for evaluating binary preferences between two different responses.\n\n5. **GPT35Evaluator**\nA class for evaluating binary preferences using the GPT-3.5 LLM.\n\n6. **OpenAIModel**\nAn enumeration class for listing available OpenAI LLM models.\n\n7. **OpenAIGPTWrapper**\nA class for wrapping OpenAI's GPT models.\n\n8. **ClaudeWrapper**\nA class for wrapping Anthropic's Claude LLM.\n\n9. **CohereWrapper**\nA class for wrapping Cohere's LLM.\n\n10. **GrokWrapper**\nA class for wrapping Grok's models.\n\n11. **MistralWrapper**\nA class for wrapping Mistral's models.\n\n12. **DeepSeekWrapper**\nA class for wrapping DeepSeek's models.\n\n13. **Llama3Wrapper**\nA class for wrapping Llama 3 models via DeepInfra.\n\n14. **ChatBot**\nA class for creating chatbot instances based on provided LLMs.\n\n## Required Setup\n1. Install all from requirements.txt\n```cmd\npip install -r requirements.txt\n```\n2. Create a `.env` file in the root of the project and add the following API keys:\n```\nOPENAI_API_KEY=your_openai_api_key\nCOHERE_API_KEY=your_cohere_api_key\nGROK_API_KEY=your_grok_api_key\nMISTRAL_API_KEY=your_mistral_api_key\nDEEPSEEK_API_KEY=your_deepseek_api_key\nDEEPINFRA_API_KEY=your_deepinfra_api_key\n```\n\n## Example Usage\nThe `main.py` script provides an example of how to use the library. It initializes all the supported models, defines an objective, and then runs a series of evaluations comparing each model to GPT-3.5.\n\nTo run the example:\n```bash\npython main.py\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspenceryonce%2Fllmeval","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fspenceryonce%2Fllmeval","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fspenceryonce%2Fllmeval/lists"}