Projects in Awesome Lists tagged with evaluations

https://github.com/scale3-labs/langtrace

Langtrace 🔍 is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evaluations and metrics for popular LLMs, LLM frameworks, vectorDBs and more.. Integrate using Typescript, Python. 🚀💻📊

ai datasets evaluations gpt langchain llm llm-framework llmops observability open-source open-telemetry openai prompt-engineering tracing

Last synced: 15 May 2025

https://github.com/Scale3-Labs/langtrace

Langtrace 🔍 is an open-source, Open Telemetry based end-to-end observability tool for LLM applications, providing real-time tracing, evaluations and metrics for popular LLMs, LLM frameworks, vectorDBs and more.. Integrate using Typescript, Python. 🚀💻📊

ai datasets evaluations gpt langchain llm llm-framework llmops observability open-source open-telemetry openai prompt-engineering tracing

Last synced: 30 Oct 2025

https://github.com/log10-io/log10

Python client library for improving your LLM app accuracy

agents ai anthropic artificial-intelligence autonomous-agents debugging evaluations feedback fine-tuning llmops llms logging monitoring openai python rlhf

Last synced: 11 Apr 2025

https://github.com/dreadnode/airtbench-code

Code Repository for: AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models

agents ai ai-agents artificial-intelligence benchmark benchmark-datasets benchmarking ctf cyber-evals cybersecurity evaluations hacking llm offensive-security research security

Last synced: 30 Apr 2026

https://github.com/boxbeam/crunch

The fastest java expression compiler/evaluator

evaluating-mathematical-expressions evaluations

Last synced: 06 Apr 2025

https://github.com/llm-evaluation-s-always-fatiguing/leaf-playground

A framework to build scenario simulation projects where human and LLM based agents can participant in, with a user-friendly web UI to visualize simulation, support automatically evaluation on agent action level.

agent agent-based-simulation agents automation chatgpt evaluations llm-evaluation

Last synced: 02 Mar 2025

https://github.com/greynewell/mcpbr

Evaluate MCP servers with Model Context Protocol Benchmark Runner

ai-tools anthropic benchmarking benchmarks claude-code claude-code-plugin claude-code-skills cli cybergym evaluations llm-agents mcp-server swe-bench

Last synced: 12 Feb 2026

https://github.com/fwdai/reticle

Postman for AI - design, evaluate, and debug LLM interactions with full transparency.

agentic-ai ai ai-agents ai-testing ai-tool ai-tools desktop desktop-app developer-tools evaluations llm llm-tools prompt-engineering tauri

Last synced: 04 Apr 2026

https://github.com/yisaienkov/evaluations

This library implements various metrics (including Kaggle Competition, Medicine) for evaluating ML, DL, AI models, and algorithms. 📐📊📈📉📏

evaluations kaggle kaggle-competition metrics metrics-library pypi python python-library python3

Last synced: 13 Apr 2025

https://github.com/evaluation-context-protocol/ecp

ECP is a standardized interface for orchestrating, auditing, and enforcing authority limits in AI Agent evaluations. It moves evaluation from "brittle Python scripts" to a deterministic infrastructure protocol

evaluation-metrics evaluations llm-evaluation model-evaluation

Last synced: 25 Apr 2026

https://github.com/dynatrace-oss/dt-evals

AI evaluators CLI for your AI apps and Agents - Dynatrace AI Observability

agents ai evals evaluations llm-as-judge observability

Last synced: 14 May 2026

https://github.com/fkapsahili/entrag

EntRAG - Enterprise RAG Benchmark

benchmark dataset evaluation evaluations generative-ai knowledge-graph llm llm-evaluation rag rag-evaluation retrieval retrieval-augmented-generation

Last synced: 09 Mar 2026

https://github.com/bhadresh-laiya/program-evaluation.com

Do a program evaluation that really counts! That will help other students and will put really make universities and colleges take students experiences to heart!

blade-template built colleges counts evaluation evaluation-data evaluations laravel-framework laravel6 program students students-experiences universities using