0 "evaluation" Awesome Lists
awesome-semantic-segmentation
:metal: awesome-semantic-segmentation
benchmark deeplearning evaluation semantic-segmentation
10,832 stars
2,476 forks
53 projects
Last updated: 11 May 2026
Awesome-LLM-Long-Context-Modeling
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
agent awsome-list benchmark blogs compress evaluation large-language-models length-extrapolation llm long-context-modeling
2,103 stars
95 forks
1,616 projects
Last updated: 28 May 2026
awesome-llm-eval
Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表,主要面向基础大模型评测,旨在探求生成式AI的技术边界.
awsome-list awsome-lists benchmark bert chatglm chatgpt dataset evaluation gpt3 large-language-model
637 stars
67 forks
479 projects
Last updated: 19 May 2026
awesome-llm-unlearning
A resource repository for machine unlearning in large language models
ai-safety alignment awesome awesome-list evaluation knowledge-erasure large-language-model llm llm-safety llm-unlearning
587 stars
31 forks
668 projects
Last updated: 15 May 2026
Awesome-Evaluation-of-Visual-Generation
A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems
awesome benchmark evaluation evaluation-metrics evaluation-system generative-models image-generation video-generation
435 stars
23 forks
837 projects
Last updated: 17 May 2026
awesome-foundation-model-leaderboards
A curated list of awesome leaderboard-oriented resources for AI domain
ai-agent artificial-intelligence awesome-list benchmark deep-learning evaluation foundation-model large-ai-model leaderboard machine-learning
361 stars
50 forks
546 projects
Last updated: 06 Jun 2026
awesome-data-contamination
The Paper List on Data Contamination for Large Language Models Evaluation.
awesome-list data-contamination evaluation foundation-models large-language-models llm paper-list pre-trained-language-models pre-trained-model
114 stars
6 forks
268 projects
Last updated: 02 Jun 2026
awesome-ai-eval
☑️ A curated list of tools, methods & platforms for evaluating AI reliability in real applications
ai-evaluation ai-evaluation-framework ai-evaluation-metrics ai-evaluation-tools awesome awesome-list awesome-lists chatgpt claude evaluation
77 stars
13 forks
186 projects
Last updated: 13 May 2026
awesome-datacentric-llm
Trending projects & awesome papers about data-centric llm studies.
data-centric-ai evaluation llm pre-training
40 stars
2 forks
45 projects
Last updated: 07 Feb 2026
awesome-ai-agent-testing
🤖 A curated list of resources for testing AI agents - frameworks, methodologies, benchmarks, tools, and best practices for ensuring reliable, safe, and effective autonomous AI systems
agent-evaluation agentic-ai ai-agents ai-benchmark ai-safety artificial-intelligence awesome-list benchmark chaos chaos-engineering
31 stars
9 forks
168 projects
Last updated: 05 May 2026
awesome-agent-rl-environments
A curated list of training & evaluation environments for LLM/VLM agents (SWE-Gym, GEM, RAGEN, AgentGym, WebArena, OSWorld, ToolBench…). Updated weekly.
agent agent-rl agentic-ai awesome awesome-list benchmark browser-agent computer-use-agent evaluation grpo
1 stars
0 forks
69 projects
Last updated: 21 May 2026