Projects in Awesome Lists tagged with llm-as-a-judge
A curated list of projects in awesome lists tagged with llm-as-a-judge .
https://github.com/agenta-ai/agenta
The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.
llm-as-a-judge llm-evaluation llm-framework llm-monitoring llm-observability llm-platform llm-playground llm-tools llmops-platform prompt-engineering prompt-management rag-evaluation
Last synced: 12 May 2025
https://github.com/prometheus-eval/prometheus-eval
Evaluate your LLM's response with Prometheus and GPT4 💯
evaluation gpt4 litellm llm llm-as-a-judge llm-as-evaluator llmops python vllm
Last synced: 05 Apr 2025
https://github.com/iaar-shanghai/xfinder
[ICLR 2025] xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation
benchmark cc-by-nc-nd-4 chatglm dataset evaluation gpt judge-model key-answer-extraction large-language-models llm llm-as-a-judge llm-as-evaluator lm-evaluation open-compass phi qwen regex reliability reliable-evaluation xfinder
Last synced: 06 Apr 2025
https://github.com/iaar-shanghai/xverify
xVerify: Efficient Answer Verifier for Large Language Model Evaluations
benchmark cc-by-nc-nd-4 chatgpt deepseek-math evaluation judge-model llm llm-as-a-judge math-verify open-compass open-r1 reasoning-models regex reliability reliability-tools xverify
Last synced: 14 Apr 2025
https://github.com/root-signals/root-signals-mcp
MCP for Root Signals Evaluation Platform
agentic-ai evals llm-as-a-judge mcp model-context-protocol pydantic-ai
Last synced: 03 May 2025