An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with llm-testing

A curated list of projects in awesome lists tagged with llm-testing .

https://github.com/vincentkoc/tiny_qa_benchmark_pp

Tiny QA Benchmark++ a micro-benchmark suite (52-item gold + on-demand multilingual synthetic packs), generator CLI, and CI-ready eval harness for ultra-fast LLM smoke-testing & regression-catching.

benchmark dataset evaluation huggingface-datasets litellm llm llm-testing llmops qa-dataset smoke-test synthetic-data tinybenchmarks

Last synced: 12 Jun 2025

https://github.com/rhesis-ai/rhesis-sdk

Open-source test generation SDK for LLM applications. Access curated test sets. Build context-specific test sets and collaborate with subject matter experts.

application-insights compliance llm-evaluation llm-testing open-source quality-assessment reliability responsible-ai robustness trustworthiness validation

Last synced: 29 Jan 2026

https://github.com/quarkiverse/quarkus-rage4j

Rage4j is a java library thats helps evaluate LLM's based on scientifically grounded metrics

ai continuous-integration langchain4j large-language-models llm-testing llms openai quarkus-extension semantic-evaluation testing-library

Last synced: 04 Jun 2026

https://github.com/pyladiesams/eval-llm-based-apps-jan2025

Create an evaluation framework for your LLM based app. Incorporate it into your test suite. Lay the monitoring foundation.

llm llm-eval llm-evals llm-evaluation-framework llm-evaluation-metrics llm-monitoring llm-test llm-testing llmops llms workshop

Last synced: 12 May 2025

https://github.com/yiouli/pixie-qa

Automated quality assurance for AI applications.

agent agent-skills ai ai-evals dev eval llm llm-testing qa skill testing

Last synced: 21 Apr 2026

https://github.com/lukecarr/litmus

Specification testing for structured LLM responses.

llm-comparison llm-testing openrouter specification-test

Last synced: 13 Jan 2026

https://github.com/prompt-foundry/go-sdk

The prompt engineering, prompt management, and prompt evaluation tool for Go.

go golang gpt gpt-4 llm-eval llm-evaluation llm-test llm-testing open-api prompt-engineering prompt-eva prompt-management prompt-manager prompt-test

Last synced: 14 Feb 2026

https://github.com/tugkanboz/awesome-ai-testing

A curated list of AI-powered testing tools, frameworks, and resources for QA engineers. From test generation to self-healing automation, MCP-based testing, LLM evaluation, and more.

ai-test-automation ai-testing awesome awesome-list llm-testing mcp-testing qa software-testing test-automation test-generation

Last synced: 03 May 2026

https://github.com/transcentlin/api-probe-platform

先进的大模型 API 多服务商性能与兼容性评估平台 | An advanced, multi-provider LLM API performance benchmark and evaluation platform.

api-benchmark api-evaluation deepseek fastapi llm-benchmark llm-evaluation llm-testing model-evaluation ollama openai-compatibility python react-dashboard tool-calling

Last synced: 17 Jun 2026

https://github.com/ihatenodejs/llm-tests

My personal, web-dev focused LLM tests

llm llm-testing

Last synced: 14 Feb 2026

https://github.com/taimoorkhan10/replayd

Turn failed AI agent runs into replayable regression tests. Catch regressions before you ship.

agent-ops agent-testing ai-agents ai-infrastructure ai-reliability llm-ops llm-testing open-source prompt-testing python regression-testing release-control replay-testing sdk

Last synced: 14 Jun 2026

https://github.com/sandy-sp/ai-reply-index

A community-driven archive of AI prompts and responses. Log, compare, and contribute structured examples to build a searchable public prompt-response database.

ai-prompts ai-responses community-project llm-testing open-data prompt-database

Last synced: 04 May 2025