Projects in Awesome Lists tagged with ai-testing
A curated list of projects in awesome lists tagged with ai-testing .
https://github.com/giskard-ai/giskard
🐢 Open-Source Evaluation & Testing for AI & LLM systems
agent-evaluation ai-red-team ai-security ai-testing fairness-ai llm llm-eval llm-evaluation llm-security llmops ml-testing ml-validation mlops rag-evaluation red-team-tools responsible-ai trustworthy-ai
Last synced: 14 May 2025
https://github.com/Giskard-AI/giskard
🐢 Open-Source Evaluation & Testing for AI & LLM systems
agent-evaluation ai-red-team ai-security ai-testing fairness-ai llm llm-eval llm-evaluation llm-security llmops ml-testing ml-validation mlops rag-evaluation red-team-tools responsible-ai trustworthy-ai
Last synced: 15 Apr 2025
https://github.com/bug0inc/passmark
The open-source Playwright library for AI browser regression testing with intelligent caching, auto-healing, and multi-model verification.
ai ai-agents ai-testing aigateway aisdk browser-testing e2e-testing playwright qa qa-automation qaautomation regression-testing testing typescript vercel
Last synced: 20 May 2026
https://github.com/langwatch/scenario
Agentic testing for agentic codebases
agent-simulations agent-testing ai-testing javascript-library python-library
Last synced: 08 Mar 2026
https://github.com/PacificAI/langtest
Deliver safe & effective language models
ai-safety ai-testing artificial-intelligence benchmark-framework benchmarks ethics-in-ai large-language-models llm llm-as-evaluator llm-evaluation-toolkit llm-test llm-testing ml-safety ml-testing mlops model-assessment nlp responsible-ai trustworthy-ai
Last synced: 26 Jun 2026
https://github.com/Pacific-AI-Corp/langtest
Deliver safe & effective language models
ai-safety ai-testing artificial-intelligence benchmark-framework benchmarks ethics-in-ai large-language-models llm llm-as-evaluator llm-evaluation-toolkit llm-test llm-testing ml-safety ml-testing mlops model-assessment nlp responsible-ai trustworthy-ai
Last synced: 16 Oct 2025
https://github.com/faiscadev/fakecloud
Free, open-source AWS emulator. LocalStack alternative: 41 services, 3,704 operations, true 100% Smithy conformance (124,255/124,255 variants pass). No account, no auth token, no paid tier.
ai-testing aws aws-bedrock aws-emulator aws-sdk aws-testing bedrock bedrock-emulator cognito dynamodb emulator integration-testing lambda llm-testing localstack-alternative moto-alternative s3 sns sqs terraform
Last synced: 27 Jun 2026
https://github.com/copilotkit/aimock
Mock everything your AI app talks to — LLM APIs, MCP, A2A, vector DBs, search. One package, one port, zero dependencies.
ai-testing aimock llm mcp mock-server openai
Last synced: 27 May 2026
https://github.com/ai-dashboad/flutter-skill
AI-powered E2E testing for 10 platforms. 253 MCP tools. Zero config. Works with Claude, Cursor, Windsurf, Copilot. Test Flutter, React Native, iOS, Android, Web, Electron, Tauri, KMP, .NET MAUI — all from natural language.
ai ai-testing android automation claude cross-platform cursor e2e-testing electron flutter ios mcp mcp-server model-context-protocol playwright-alternative react-native tauri test-automation testing-tools web-testing
Last synced: 01 Apr 2026
https://github.com/vostride/agent-qa
The self-improving Agentic QA harness with Memory. Write tests in natural language. Catch regressions before releases ship.
agents ai ai-agents ai-testing appium chatgpt claude-code clawdbot codex developer-tools end-to-end-testing fair-source llm mcp memory moltbot openclaw playwright testing webdriverio
Last synced: 09 Jun 2026
https://github.com/kdunee/intentguard
A Python library for verifying code properties using natural language assertions.
ai-testing code-quality code-verification language-models llm natural-language pytest test-automation testing unittest
Last synced: 13 Dec 2025
https://github.com/tommylemon/cvauto
👁 零代码零标注 CV AI 自动化测试工具 🚀 免除大量人工画框和打标签等,直接零代码快速自动化测试 CV 计算机视觉 AI 人工智能图像识别算法:行人检测、动植物分类、人脸识别、OCR 车牌识别、旋转校正、舞蹈姿态、抠图分割 等,还可一键 下载测试报告、导出训练和测试数据集
ai ai-testing apijson classification computer-vision cv cv2 detection face-recognition inference-api inference-server ocr pose-estimation rotation segmentation test-automation ultralytics ultralytics-yolo yolo yolo11
Last synced: 15 Sep 2025
https://github.com/fwdai/reticle
Postman for AI - design, evaluate, and debug LLM interactions with full transparency.
agentic-ai ai ai-agents ai-testing ai-tool ai-tools desktop desktop-app developer-tools evaluations llm llm-tools prompt-engineering tauri
Last synced: 04 Apr 2026
https://github.com/evaliphy/evaliphy
The E2E AI testing tool | No ML Overhead
ai ai-test-automation ai-testing ai-testing-tool end-to-end-testing llm-evaluation llm-evaluation-framework llm-evaluation-toolkit llm-testing rag rag-evaluation rag-pipeline test-automation test-automation-framework testing-tools
Last synced: 09 Jun 2026
https://github.com/greynewell/evaldriven.org
Ship evals before you ship features.
ai-engineering ai-evaluation ai-quality ai-safety ai-testing automation benchmarking best-practices ci-cd continuous-evaluation devops eval-driven-development evaluation llm-evaluation machine-learning manifesto methodology quality-assurance software-engineering testing
Last synced: 22 Feb 2026
https://github.com/jhd3197/prompture
Prompture is an API-first library for requesting structured JSON output from LLMs (or any structure), validating it against a schema, and running comparative tests between models.
ai-testing json-validation llm openai prompt-engineering prompt-testing prompture pydantic structured-output toon
Last synced: 24 May 2026
https://github.com/alepot55/agentrial
Statistical evaluation framework for AI agents
agent-evaluation ai-agents ai-testing ci-cd confidence-intervals llm llm-evaluation mlops non-deterministic pytest python quality-assurance statistical-testing testing
Last synced: 11 Feb 2026
https://github.com/srbarrios/agentic-test-explorer
An agnostic AI-driven exploratory test framework that intelligently explores, tests, and validates any application
agentic ai-agents ai-testing autonomous-testing browser-automation exploratory-testing langchain langgraph mcp playwright python qa-automation test-automation
Last synced: 25 Jun 2026
https://github.com/alexandriashai/cbrowser
Cognitive Browser: The browser automation that thinks. Constitutional safety • Persona UX testing • Natural language interface • Self-healing selectors • Built for AI agents
accessibility ai ai-testing browser-automation claude e2e-testing mcp playwright testing typescript ux-testing visual-testing web-scraping
Last synced: 23 Apr 2026
https://github.com/langwatch/scenario-go
Agent testing library that uses an agent to test your agent, in Go.
agents ai ai-qa ai-testing qa-automation testing
Last synced: 23 Aug 2025
https://github.com/lee-to/ai-tester
End-to-end behavioral testing for Claude Code skills, bare system prompts, and any agent runtime — run real scenarios in an isolated git sandbox, capture the full tool-call trace, and assert it against declarative YAML.
ai ai-agents ai-agents-framework ai-testing
Last synced: 23 May 2026
https://github.com/axeforging/playwright-smart-locators
AI-powered self-healing smart web locators for Playwright tests.
ai-testing auto-healing dom e2e-testing flaky-tests local ollama playwright playwright-plugin qa-automation
Last synced: 22 Jun 2026
https://github.com/vitron-ai/themis
Intent-first unit testing framework for AI agents in Node.js and TypeScript.
agent-testing ai ai-agents ai-testing developer-tools llm nodejs test-framework testing typescript unit-testing
Last synced: 18 Apr 2026
https://github.com/globulus/checkirai
Checkir AI is a spec-driven verification runtime using local LLMs
ai-testing cli llm local-ai mcp ollama testing web-testing
Last synced: 30 May 2026
https://github.com/naodeng/awesome-qa-prompt
A professional collection of AI prompts for QA (Quality Assurance) professionals, designed to help test engineers and QA teams work more efficiently throughout the software testing lifecycle. https://qaprompt.inaodeng.com/
ai-testing prompt-engineering prompts qa
Last synced: 27 Jan 2026
https://github.com/multiplex-ai/muggle-ai-works
Your AI coding agent writes code fast — we make sure the web product actually works. Paste a URL, the agent clicks through signup/checkout/dashboards like a real user, reports failures with screenshots. Currently web only. Works with Claude Code, Cursor, Codex, Windsurf.
ai-agents ai-powered-testing ai-qa ai-testing browser-automation claude-code cursor developer-tools devtools e2e-testing electron mcp mcp-server mcp-tools model-context-protocol no-code-testing qa-automation qa-testing software-testing test-automation
Last synced: 09 Jun 2026
https://github.com/thisguymartin/burro
Burro is a command-line interface (CLI) tool built with Deno for evaluating Large Language Model (LLM) outputs. It provides a straightforward way to run different types of evaluations with secure API key management.
ai-testing deno evaluation llm quality-assurance
Last synced: 17 Jan 2026
https://github.com/glubean/vscode
VS Code extension — API explorer that replaces Postman + test runner with structured traces. Same file for exploration and CI.
ai-testing api-explorer api-testing postman-alternative rest-client test-runner typescript vscode-extension
Last synced: 27 Apr 2026
https://github.com/sahajamoth/apex
APEX — Autonomous Path EXploration. Drives any repository toward 100% branch coverage.
agentic-coding ai-agent ai-coding ai-developer-tools ai-testing autonomous-testing code-coverage concolic-execution fuzzing generative-ai llm python rust sast security static-analysis symbolic-execution test-automation testing vibe-coding
Last synced: 27 Mar 2026
https://github.com/glubean/skill
Agent skill for Glubean — teaches AI agents to write, run, and fix API verification in TypeScript
agent-skill ai-testing api-testing claude-code claude-code-skill claude-skill codex cursor mcp test-automation typescript
Last synced: 01 Jun 2026
https://github.com/monkscode/natural-language-to-robot-framework
Turn plain English into Robot Framework files with AI. No dependencies, no hassle — just validated, ready-to-run tests
agentic-framework ai-testing automation-framework docker fastapi generative-ai large-language-models llm-applications natural-language-processing nlp-to-code open-source python quality-assurance robotframework selenium software-testing test-automation
Last synced: 05 Apr 2026
https://github.com/chigwell/llmtestr
A new package that helps developers integration-test AI and LLM applications by validating structured outputs. It takes a user's test scenario or prompt as input, sends it to an LLM, and uses pattern
ai-powered-system-testing ai-testing automated-test-execution code-snippet-validation developer-tooling formatting-error-detection integration-testing json-schema-validation llm-validation output-consistency-enforcement output-content-verification pattern-matching prompt-driven-testing regression-detection response-format-checking schema-enforcement structured-output-verification tagged-response-validation test-automation test-scenario-input
Last synced: 14 Jan 2026
https://github.com/radoslaw-sz/maia
A pytest-based framework for testing multi AI agents systems. It provides a flexible and extensible platform for complex multi-agent simulations. Supports many integrations like LiteLLM, CrewAI, LangChain etc.
agentic agents ai ai-testing ai-testing-tool framework llm maia prompt-engineering prompt-testing python test
Last synced: 25 Sep 2025
https://github.com/tugkanboz/awesome-ai-testing
A curated list of AI-powered testing tools, frameworks, and resources for QA engineers. From test generation to self-healing automation, MCP-based testing, LLM evaluation, and more.
ai-test-automation ai-testing awesome awesome-list llm-testing mcp-testing qa software-testing test-automation test-generation
Last synced: 03 May 2026
https://github.com/vishalquantana/klavity
Open-core AI bug reporting & testing — right-click to file grounded bugs to Jira/Linear/GitHub/Plane, AI personas (Sims) that review your product, and self-healing end-to-end tests.
ai-agents ai-testing bug-reporting bug-tracker bun chrome-extension developer-tools end-to-end-testing feedback-widget github-issues jira linear open-core plane playwright qa screenshot self-healing-tests typescript user-personas
Last synced: 05 Jul 2026
https://github.com/vishalquantana/klav-snap
Open-core AI bug reporting & testing — right-click to file grounded bugs to Jira/Linear/GitHub/Plane, AI personas (Sims) that review your product, and self-healing end-to-end tests.
ai-agents ai-testing bug-reporting bug-tracker bun chrome-extension developer-tools end-to-end-testing feedback-widget github-issues jira linear open-core plane playwright qa screenshot self-healing-tests typescript user-personas
Last synced: 27 Jun 2026
https://github.com/shyinlim/test_result_dashboard_streamlit_gemini
A lightweight dashboard to view and analyze test automation results. Built with Streamlit + PostgreSQL, and powered by AI (Gemini) to help debug test failures faster.
ai-testing artificial-intelligence automation-testing docker docker-compose gemini gemini-ai llm llm-agent postgresql python quality quality-assurance software-testing streamlit test-automation testing-automation testing-tool testing-tools
Last synced: 09 Apr 2026
https://github.com/mohammadshamchi/ai-react-playground
🤖 The perfect playground for testing AI-generated React components. Built for ChatGPT/Claude users to instantly test and iterate on AI-created components.
ai-testing ai-tools chatgpt claude-ai-generated-code react tailwindcss typescript vite
Last synced: 20 Apr 2026
https://github.com/madgicaltechdom/playwrightmcpdemo
The purpose of this repository is to demonstrate how to generate Playwright test cases for a website using Playwright's Modern Component Patt
ai-testing ai-testing-best-practices playwright prompt-engineering test-automation
Last synced: 18 May 2026
https://github.com/javierdejesusda/checkllm
The pytest of LLM testing. Test LLM-powered applications with the same rigor as traditional software.
ai-compliance ai-safety ai-testing anthropic hallucination llm llm-evaluation openai prompt-engineering pytest rag red-teaming
Last synced: 03 May 2026
https://github.com/libraz/claude-coverwise
Make Claude write tests that actually cover every parameter interaction. Pairwise / t-wise coverage checking and generation as a Claude Code plugin, backed by the coverwise WASM engine.
ai-testing claude-code claude-code-plugin combinatorial-testing covering-array coverwise mcp mcp-server pairwise t-wise test-generation testing
Last synced: 07 Apr 2026
https://github.com/sbittla/ecommerceapp
AI Generated BDD for Java and Junit using ChatGPT4o code
ai-assitant-generated-automation ai-driven-automation ai-driven-java-junit-testing ai-driven-qa ai-driven-testing ai-generated ai-test-generator ai-testing ai-testing-best-practices ai-testing-tool chatgpt-api chatgpt-bot chatgpt-generated-api-testing cucumber-java ecommerce-application
Last synced: 02 Jul 2026
https://github.com/alisher-sdet/alisher-sdet
SDET / Automation Engineer — Web, Mobile & AI-driven testing frameworks.
ai-testing appium automation-testing playwright sdet webdriverio
Last synced: 29 May 2026
https://github.com/mucahitgurbuz/smart-test-generator
🧠🧪 AI-powered test generation tool that automatically creates comprehensive test suites for JavaScript/TypeScript codebases.
ai-testing automated-testing cli-tool dashboard test-generation typescript
Last synced: 26 Jun 2025
https://github.com/jprando/testa-habilidade-ai-typescript
O teste de fogo para IAs geradoras de código: avaliando modelos (LLMs) contra armadilhas de concorrência e Event Loop.
ai-testing async concurrency javascript llm-benchmark promise qwen typescript v8-engine
Last synced: 19 Jun 2026
https://github.com/sijadev/knowledge-base
Course data & more
ai-testing data-visualization database julia python3 tensorflow
Last synced: 16 Apr 2026
https://github.com/isatyamks/multimodal-rag
Multimodal RAG system for generating test cases and use cases from documents using hybrid retrieval, safety guards, and LLMs.
ai-testing chromadb hallucination-mitigation hybrid-search hybrid-search-technique llm ml multimodal multimodal-rag nlp prompt-safety python rag rags retrival-augmented-generation test-automation testing
Last synced: 20 Feb 2026
https://github.com/mustafaautomation/llm-testing-toolkit
Provider-agnostic LLM testing framework — regression, hallucination, quality, and toxicity evaluators for OpenAI, Anthropic, and custom APIs
ai ai-testing anthropic ci-cd evaluation hallucination-detection llm nlp openai qa-automation regression-testing test-automation testing toxicity-detection typescript
Last synced: 04 Apr 2026
https://github.com/ajaytester007/ai-agent-evaluation-portfolio
Includes AI agent architecture, CrewAI workflows, evaluation frameworks, and governance models.
agentic-ai ai ai-evaluation ai-governance ai-testing api automation crewai hallucination-detection human-in-the-loop json llm multi-agent-systems python qa reasoning-ai scenario-evaluation
Last synced: 18 Apr 2026