"evaluation" Awesome Lists
awesome-semantic-segmentation
:metal: awesome-semantic-segmentation
benchmark deeplearning evaluation semantic-segmentation
          
            10,745 stars 
            2,486 forks 
            53 projects
          
        
Last updated: 02 Oct 2025
Awesome-LLM-Long-Context-Modeling
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
agent awsome-list benchmark blogs compress evaluation large-language-models length-extrapolation llm long-context-modeling
          
            1,727 stars 
            72 forks 
            1,402 projects
          
        
Last updated: 22 Sep 2025
awesome-llm-eval
Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表,主要面向基础大模型评测,旨在探求生成式AI的技术边界.
awsome-list awsome-lists benchmark bert chatglm chatgpt dataset evaluation gpt3 large-language-model
          
            571 stars 
            45 forks 
            479 projects
          
        
Last updated: 20 Sep 2025
awesome-foundation-model-leaderboards
A curated list of awesome leaderboard-oriented resources for foundation models
artificial-intelligence awesome-list deep-learning evaluation foundation-model large-language-model leaderboard machine-learning
          
            282 stars 
            35 forks 
            488 projects
          
        
Last updated: 16 Sep 2025
awesome-data-contamination
The Paper List on Data Contamination for Large Language Models Evaluation.
awesome-list data-contamination evaluation foundation-models large-language-models llm paper-list pre-trained-language-models pre-trained-model
          
            100 stars 
            4 forks 
            200 projects
          
        
Last updated: 30 Sep 2025
awesome-datacentric-llm
Trending projects & awesome papers about data-centric llm studies.
data-centric-ai evaluation llm pre-training
          
            38 stars 
            2 forks 
            45 projects
          
        
Last updated: 04 Aug 2025
awesome-ai-agent-testing
🤖 A curated list of resources for testing AI agents - frameworks, methodologies, benchmarks, tools, and best practices for ensuring reliable, safe, and effective autonomous AI systems
agent-evaluation agentic-ai ai-agents ai-benchmark ai-safety artificial-intelligence awesome-list benchmark chaos chaos-engineering
          
            8 stars 
            2 forks 
            168 projects
          
        
Last updated: 04 Sep 2025