Projects in Awesome Lists tagged with llm-testing
A curated list of projects in awesome lists tagged with llm-testing .
https://github.com/Pacific-AI-Corp/langtest
Deliver safe & effective language models
ai-safety ai-testing artificial-intelligence benchmark-framework benchmarks ethics-in-ai large-language-models llm llm-as-evaluator llm-evaluation-toolkit llm-test llm-testing ml-safety ml-testing mlops model-assessment nlp responsible-ai trustworthy-ai
Last synced: 16 Oct 2025
https://github.com/LLAMATOR-Core/llamator
Framework for testing vulnerabilities of large language models (LLM).
agent ai ai-security attack hallucinations jailbreak llm llm-read-team llm-security llm-testing misinformation nlp owasp python rag rag-evaluation red-team red-team-tools security-tools vulnerability
Last synced: 10 May 2025
https://github.com/llamator-core/llamator
Framework for testing vulnerabilities of large language models (LLM).
ai ai-security attack hallucinations jailbreak llm llm-read-team llm-security llm-testing misinformation nlp owasp python rag rag-evaluation red-team red-team-tools red-teaming security-tools vulnerability-assessment
Last synced: 17 Jan 2026
https://github.com/romiconez/llamator
Framework for testing vulnerabilities of large language models (LLM).
ai ai-security attack hallucinations jailbreak llm llm-read-team llm-security llm-testing misinformation nlp owasp python rag rag-evaluation red-team red-team-tools red-teaming security-tools vulnerability-assessment
Last synced: 22 Mar 2025
https://github.com/vincentkoc/tiny_qa_benchmark_pp
Tiny QA Benchmark++ a micro-benchmark suite (52-item gold + on-demand multilingual synthetic packs), generator CLI, and CI-ready eval harness for ultra-fast LLM smoke-testing & regression-catching.
benchmark dataset evaluation huggingface-datasets litellm llm llm-testing llmops qa-dataset smoke-test synthetic-data tinybenchmarks
Last synced: 12 Jun 2025
https://github.com/rhesis-ai/rhesis-sdk
Open-source test generation SDK for LLM applications. Access curated test sets. Build context-specific test sets and collaborate with subject matter experts.
application-insights compliance llm-evaluation llm-testing open-source quality-assessment reliability responsible-ai robustness trustworthiness validation
Last synced: 29 Jan 2026
https://github.com/evaliphy/evaliphy
The E2E AI testing tool | No ML Overhead
ai ai-test-automation ai-testing ai-testing-tool end-to-end-testing llm-evaluation llm-evaluation-framework llm-evaluation-toolkit llm-testing rag rag-evaluation rag-pipeline test-automation test-automation-framework testing-tools
Last synced: 09 Jun 2026
https://github.com/quarkiverse/quarkus-rage4j
Rage4j is a java library thats helps evaluate LLM's based on scientifically grounded metrics
ai continuous-integration langchain4j large-language-models llm-testing llms openai quarkus-extension semantic-evaluation testing-library
Last synced: 04 Jun 2026
https://github.com/pyladiesams/eval-llm-based-apps-jan2025
Create an evaluation framework for your LLM based app. Incorporate it into your test suite. Lay the monitoring foundation.
llm llm-eval llm-evals llm-evaluation-framework llm-evaluation-metrics llm-monitoring llm-test llm-testing llmops llms workshop
Last synced: 12 May 2025
https://github.com/yiouli/pixie-qa
Automated quality assurance for AI applications.
agent agent-skills ai ai-evals dev eval llm llm-testing qa skill testing
Last synced: 21 Apr 2026
https://github.com/lukecarr/litmus
Specification testing for structured LLM responses.
llm-comparison llm-testing openrouter specification-test
Last synced: 13 Jan 2026
https://github.com/prompt-foundry/go-sdk
The prompt engineering, prompt management, and prompt evaluation tool for Go.
go golang gpt gpt-4 llm-eval llm-evaluation llm-test llm-testing open-api prompt-engineering prompt-eva prompt-management prompt-manager prompt-test
Last synced: 14 Feb 2026
https://github.com/tugkanboz/awesome-ai-testing
A curated list of AI-powered testing tools, frameworks, and resources for QA engineers. From test generation to self-healing automation, MCP-based testing, LLM evaluation, and more.
ai-test-automation ai-testing awesome awesome-list llm-testing mcp-testing qa software-testing test-automation test-generation
Last synced: 03 May 2026
https://github.com/transcentlin/api-probe-platform
先进的大模型 API 多服务商性能与兼容性评估平台 | An advanced, multi-provider LLM API performance benchmark and evaluation platform.
api-benchmark api-evaluation deepseek fastapi llm-benchmark llm-evaluation llm-testing model-evaluation ollama openai-compatibility python react-dashboard tool-calling
Last synced: 17 Jun 2026
https://github.com/ihatenodejs/llm-tests
My personal, web-dev focused LLM tests
Last synced: 14 Feb 2026
https://github.com/taimoorkhan10/replayd
Turn failed AI agent runs into replayable regression tests. Catch regressions before you ship.
agent-ops agent-testing ai-agents ai-infrastructure ai-reliability llm-ops llm-testing open-source prompt-testing python regression-testing release-control replay-testing sdk
Last synced: 14 Jun 2026
https://github.com/sandy-sp/ai-reply-index
A community-driven archive of AI prompts and responses. Log, compare, and contribute structured examples to build a searchable public prompt-response database.
ai-prompts ai-responses community-project llm-testing open-data prompt-database
Last synced: 04 May 2025