Projects in Awesome Lists tagged with llm-testing

https://github.com/Pacific-AI-Corp/langtest

Deliver safe & effective language models

ai-safety ai-testing artificial-intelligence benchmark-framework benchmarks ethics-in-ai large-language-models llm llm-as-evaluator llm-evaluation-toolkit llm-test llm-testing ml-safety ml-testing mlops model-assessment nlp responsible-ai trustworthy-ai

Last synced: 16 Oct 2025

https://github.com/LLAMATOR-Core/llamator

Framework for testing vulnerabilities of large language models (LLM).

agent ai ai-security attack hallucinations jailbreak llm llm-read-team llm-security llm-testing misinformation nlp owasp python rag rag-evaluation red-team red-team-tools security-tools vulnerability

Last synced: 10 May 2025

https://github.com/llamator-core/llamator

Framework for testing vulnerabilities of large language models (LLM).

ai ai-security attack hallucinations jailbreak llm llm-read-team llm-security llm-testing misinformation nlp owasp python rag rag-evaluation red-team red-team-tools red-teaming security-tools vulnerability-assessment

Last synced: 17 Jan 2026

https://github.com/romiconez/llamator

Framework for testing vulnerabilities of large language models (LLM).

ai ai-security attack hallucinations jailbreak llm llm-read-team llm-security llm-testing misinformation nlp owasp python rag rag-evaluation red-team red-team-tools red-teaming security-tools vulnerability-assessment

Last synced: 22 Mar 2025

https://github.com/vincentkoc/tiny_qa_benchmark_pp

Tiny QA Benchmark++ a micro-benchmark suite (52-item gold + on-demand multilingual synthetic packs), generator CLI, and CI-ready eval harness for ultra-fast LLM smoke-testing & regression-catching.

benchmark dataset evaluation huggingface-datasets litellm llm llm-testing llmops qa-dataset smoke-test synthetic-data tinybenchmarks

Last synced: 12 Jun 2025

https://github.com/rhesis-ai/rhesis-sdk

Open-source test generation SDK for LLM applications. Access curated test sets. Build context-specific test sets and collaborate with subject matter experts.

application-insights compliance llm-evaluation llm-testing open-source quality-assessment reliability responsible-ai robustness trustworthiness validation

Last synced: 29 Jan 2026

https://github.com/evaliphy/evaliphy

The E2E AI testing tool | No ML Overhead

ai ai-test-automation ai-testing ai-testing-tool end-to-end-testing llm-evaluation llm-evaluation-framework llm-evaluation-toolkit llm-testing rag rag-evaluation rag-pipeline test-automation test-automation-framework testing-tools

Last synced: 09 Jun 2026

https://github.com/quarkiverse/quarkus-rage4j

Rage4j is a java library thats helps evaluate LLM's based on scientifically grounded metrics

ai continuous-integration langchain4j large-language-models llm-testing llms openai quarkus-extension semantic-evaluation testing-library

Last synced: 04 Jun 2026

https://github.com/pyladiesams/eval-llm-based-apps-jan2025

Create an evaluation framework for your LLM based app. Incorporate it into your test suite. Lay the monitoring foundation.

llm llm-eval llm-evals llm-evaluation-framework llm-evaluation-metrics llm-monitoring llm-test llm-testing llmops llms workshop

Last synced: 12 May 2025

https://github.com/yiouli/pixie-qa

Automated quality assurance for AI applications.

agent agent-skills ai ai-evals dev eval llm llm-testing qa skill testing

Last synced: 21 Apr 2026

https://github.com/lukecarr/litmus

Specification testing for structured LLM responses.

llm-comparison llm-testing openrouter specification-test

Last synced: 13 Jan 2026

https://github.com/prompt-foundry/go-sdk

The prompt engineering, prompt management, and prompt evaluation tool for Go.

go golang gpt gpt-4 llm-eval llm-evaluation llm-test llm-testing open-api prompt-engineering prompt-eva prompt-management prompt-manager prompt-test

Last synced: 14 Feb 2026

https://github.com/tugkanboz/awesome-ai-testing

A curated list of AI-powered testing tools, frameworks, and resources for QA engineers. From test generation to self-healing automation, MCP-based testing, LLM evaluation, and more.

ai-test-automation ai-testing awesome awesome-list llm-testing mcp-testing qa software-testing test-automation test-generation

Last synced: 03 May 2026

https://github.com/transcentlin/api-probe-platform

先进的大模型 API 多服务商性能与兼容性评估平台 | An advanced, multi-provider LLM API performance benchmark and evaluation platform.

api-benchmark api-evaluation deepseek fastapi llm-benchmark llm-evaluation llm-testing model-evaluation ollama openai-compatibility python react-dashboard tool-calling

Last synced: 17 Jun 2026

https://github.com/ihatenodejs/llm-tests

My personal, web-dev focused LLM tests

llm llm-testing

Last synced: 14 Feb 2026

https://github.com/taimoorkhan10/replayd

Turn failed AI agent runs into replayable regression tests. Catch regressions before you ship.

agent-ops agent-testing ai-agents ai-infrastructure ai-reliability llm-ops llm-testing open-source prompt-testing python regression-testing release-control replay-testing sdk

Last synced: 14 Jun 2026

https://github.com/sandy-sp/ai-reply-index

A community-driven archive of AI prompts and responses. Log, compare, and contribute structured examples to build a searchable public prompt-response database.

ai-prompts ai-responses community-project llm-testing open-data prompt-database

Last synced: 04 May 2025

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome