0 "llm-evaluation" Awesome Lists
awesome-llm-eval
Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表,主要面向基础大模型评测,旨在探求生成式AI的技术边界.
awsome-list awsome-lists benchmark bert chatglm chatgpt dataset evaluation gpt3 large-language-model
608 stars
51 forks
479 projects
Last updated: 08 Feb 2026
awesome-ai-agent-testing
🤖 A curated list of resources for testing AI agents - frameworks, methodologies, benchmarks, tools, and best practices for ensuring reliable, safe, and effective autonomous AI systems
agent-evaluation agentic-ai ai-agents ai-benchmark ai-safety artificial-intelligence awesome-list benchmark chaos chaos-engineering
27 stars
5 forks
168 projects
Last updated: 15 Feb 2026
ai-llmops-index
Comprehensive LLMOps reference index: observability platforms, inference cost intelligence, failure mode taxonomy, stack compatibility matrices, and regulatory compliance mapping for LLMs in production.
ai-compliance ai-governance ai-infrastructure ai-observability ai-safety awesome-list llm-benchmarks llm-cost-comparison llm-evaluation llm-failure-modes
1 stars
0 forks
95 projects
Last updated: 12 Mar 2026