Projects in Awesome Lists tagged with ai-testing

https://github.com/giskard-ai/giskard

🐢 Open-Source Evaluation & Testing for AI & LLM systems

agent-evaluation ai-red-team ai-security ai-testing fairness-ai llm llm-eval llm-evaluation llm-security llmops ml-testing ml-validation mlops rag-evaluation red-team-tools responsible-ai trustworthy-ai

Last synced: 14 May 2025

https://github.com/Giskard-AI/giskard

🐢 Open-Source Evaluation & Testing for AI & LLM systems

agent-evaluation ai-red-team ai-security ai-testing fairness-ai llm llm-eval llm-evaluation llm-security llmops ml-testing ml-validation mlops rag-evaluation red-team-tools responsible-ai trustworthy-ai

Last synced: 15 Apr 2025

https://github.com/bug0inc/passmark

The open-source Playwright library for AI browser regression testing with intelligent caching, auto-healing, and multi-model verification.

ai ai-agents ai-testing aigateway aisdk browser-testing e2e-testing playwright qa qa-automation qaautomation regression-testing testing typescript vercel

Last synced: 20 May 2026

https://github.com/langwatch/scenario

Agentic testing for agentic codebases

agent-simulations agent-testing ai-testing javascript-library python-library

Last synced: 08 Mar 2026

https://github.com/PacificAI/langtest

Deliver safe & effective language models

ai-safety ai-testing artificial-intelligence benchmark-framework benchmarks ethics-in-ai large-language-models llm llm-as-evaluator llm-evaluation-toolkit llm-test llm-testing ml-safety ml-testing mlops model-assessment nlp responsible-ai trustworthy-ai

Last synced: 26 Jun 2026

https://github.com/Pacific-AI-Corp/langtest

Deliver safe & effective language models

ai-safety ai-testing artificial-intelligence benchmark-framework benchmarks ethics-in-ai large-language-models llm llm-as-evaluator llm-evaluation-toolkit llm-test llm-testing ml-safety ml-testing mlops model-assessment nlp responsible-ai trustworthy-ai

Last synced: 16 Oct 2025

https://github.com/faiscadev/fakecloud

Free, open-source AWS emulator. LocalStack alternative: 41 services, 3,704 operations, true 100% Smithy conformance (124,255/124,255 variants pass). No account, no auth token, no paid tier.

ai-testing aws aws-bedrock aws-emulator aws-sdk aws-testing bedrock bedrock-emulator cognito dynamodb emulator integration-testing lambda llm-testing localstack-alternative moto-alternative s3 sns sqs terraform

Last synced: 27 Jun 2026

https://github.com/copilotkit/aimock

Mock everything your AI app talks to — LLM APIs, MCP, A2A, vector DBs, search. One package, one port, zero dependencies.

ai-testing aimock llm mcp mock-server openai

Last synced: 27 May 2026

https://github.com/ai-dashboad/flutter-skill

AI-powered E2E testing for 10 platforms. 253 MCP tools. Zero config. Works with Claude, Cursor, Windsurf, Copilot. Test Flutter, React Native, iOS, Android, Web, Electron, Tauri, KMP, .NET MAUI — all from natural language.

ai ai-testing android automation claude cross-platform cursor e2e-testing electron flutter ios mcp mcp-server model-context-protocol playwright-alternative react-native tauri test-automation testing-tools web-testing

Last synced: 01 Apr 2026

https://github.com/vostride/agent-qa

The self-improving Agentic QA harness with Memory. Write tests in natural language.  Catch regressions before releases ship.

agents ai ai-agents ai-testing appium chatgpt claude-code clawdbot codex developer-tools end-to-end-testing fair-source llm mcp memory moltbot openclaw playwright testing webdriverio

Last synced: 09 Jun 2026

https://github.com/kdunee/intentguard

A Python library for verifying code properties using natural language assertions.

ai-testing code-quality code-verification language-models llm natural-language pytest test-automation testing unittest

Last synced: 13 Dec 2025

https://github.com/tommylemon/cvauto

👁 零代码零标注 CV AI 自动化测试工具 🚀 免除大量人工画框和打标签等，直接零代码快速自动化测试 CV 计算机视觉 AI 人工智能图像识别算法：行人检测、动植物分类、人脸识别、OCR 车牌识别、旋转校正、舞蹈姿态、抠图分割等，还可一键下载测试报告、导出训练和测试数据集

ai ai-testing apijson classification computer-vision cv cv2 detection face-recognition inference-api inference-server ocr pose-estimation rotation segmentation test-automation ultralytics ultralytics-yolo yolo yolo11

Last synced: 15 Sep 2025

https://github.com/fwdai/reticle

Postman for AI - design, evaluate, and debug LLM interactions with full transparency.

agentic-ai ai ai-agents ai-testing ai-tool ai-tools desktop desktop-app developer-tools evaluations llm llm-tools prompt-engineering tauri

Last synced: 04 Apr 2026

https://github.com/evaliphy/evaliphy

The E2E AI testing tool | No ML Overhead

ai ai-test-automation ai-testing ai-testing-tool end-to-end-testing llm-evaluation llm-evaluation-framework llm-evaluation-toolkit llm-testing rag rag-evaluation rag-pipeline test-automation test-automation-framework testing-tools

Last synced: 09 Jun 2026

https://github.com/greynewell/evaldriven.org

Ship evals before you ship features.

ai-engineering ai-evaluation ai-quality ai-safety ai-testing automation benchmarking best-practices ci-cd continuous-evaluation devops eval-driven-development evaluation llm-evaluation machine-learning manifesto methodology quality-assurance software-engineering testing

Last synced: 22 Feb 2026

https://github.com/jhd3197/prompture

Prompture is an API-first library for requesting structured JSON output from LLMs (or any structure), validating it against a schema, and running comparative tests between models.

ai-testing json-validation llm openai prompt-engineering prompt-testing prompture pydantic structured-output toon

Last synced: 24 May 2026

https://github.com/alepot55/agentrial

Statistical evaluation framework for AI agents

agent-evaluation ai-agents ai-testing ci-cd confidence-intervals llm llm-evaluation mlops non-deterministic pytest python quality-assurance statistical-testing testing

Last synced: 11 Feb 2026

https://github.com/srbarrios/agentic-test-explorer

An agnostic AI-driven exploratory test framework that intelligently explores, tests, and validates any application

agentic ai-agents ai-testing autonomous-testing browser-automation exploratory-testing langchain langgraph mcp playwright python qa-automation test-automation

Last synced: 25 Jun 2026

https://github.com/alexandriashai/cbrowser

Cognitive Browser: The browser automation that thinks. Constitutional safety • Persona UX testing • Natural language interface • Self-healing selectors • Built for AI agents

accessibility ai ai-testing browser-automation claude e2e-testing mcp playwright testing typescript ux-testing visual-testing web-scraping

Last synced: 23 Apr 2026

https://github.com/langwatch/scenario-go

Agent testing library that uses an agent to test your agent, in Go.

agents ai ai-qa ai-testing qa-automation testing

Last synced: 23 Aug 2025

https://github.com/axeforging/playwright-smart-locators

AI-powered self-healing smart web locators for Playwright tests.

ai-testing auto-healing dom e2e-testing flaky-tests local ollama playwright playwright-plugin qa-automation

Last synced: 22 Jun 2026

https://github.com/lee-to/ai-tester

End-to-end behavioral testing for Claude Code skills, bare system prompts, and any agent runtime — run real scenarios in an isolated git sandbox, capture the full tool-call trace, and assert it against declarative YAML.

ai ai-agents ai-agents-framework ai-testing

Last synced: 23 May 2026

https://github.com/vitron-ai/themis

Intent-first unit testing framework for AI agents in Node.js and TypeScript.

agent-testing ai ai-agents ai-testing developer-tools llm nodejs test-framework testing typescript unit-testing

Last synced: 18 Apr 2026

https://github.com/globulus/checkirai

Checkir AI is a spec-driven verification runtime using local LLMs

ai-testing cli llm local-ai mcp ollama testing web-testing

Last synced: 30 May 2026

https://github.com/naodeng/awesome-qa-prompt

A professional collection of AI prompts for QA (Quality Assurance) professionals, designed to help test engineers and QA teams work more efficiently throughout the software testing lifecycle. https://qaprompt.inaodeng.com/

ai-testing prompt-engineering prompts qa

Last synced: 27 Jan 2026

https://github.com/glubean/vscode

VS Code extension — API explorer that replaces Postman + test runner with structured traces. Same file for exploration and CI.

ai-testing api-explorer api-testing postman-alternative rest-client test-runner typescript vscode-extension

Last synced: 27 Apr 2026

https://github.com/thisguymartin/burro

Burro is a command-line interface (CLI) tool built with Deno for evaluating Large Language Model (LLM) outputs. It provides a straightforward way to run different types of evaluations with secure API key management.

ai-testing deno evaluation llm quality-assurance

Last synced: 17 Jan 2026

https://github.com/sahajamoth/apex

APEX — Autonomous Path EXploration. Drives any repository toward 100% branch coverage.

agentic-coding ai-agent ai-coding ai-developer-tools ai-testing autonomous-testing code-coverage concolic-execution fuzzing generative-ai llm python rust sast security static-analysis symbolic-execution test-automation testing vibe-coding

Last synced: 27 Mar 2026

https://github.com/multiplex-ai/muggle-ai-works

Your AI coding agent writes code fast — we make sure the web product actually works. Paste a URL, the agent clicks through signup/checkout/dashboards like a real user, reports failures with screenshots. Currently web only. Works with Claude Code, Cursor, Codex, Windsurf.

ai-agents ai-powered-testing ai-qa ai-testing browser-automation claude-code cursor developer-tools devtools e2e-testing electron mcp mcp-server mcp-tools model-context-protocol no-code-testing qa-automation qa-testing software-testing test-automation

Last synced: 09 Jun 2026

https://github.com/monkscode/natural-language-to-robot-framework

Turn plain English into Robot Framework files with AI. No dependencies, no hassle — just validated, ready-to-run tests

agentic-framework ai-testing automation-framework docker fastapi generative-ai large-language-models llm-applications natural-language-processing nlp-to-code open-source python quality-assurance robotframework selenium software-testing test-automation

Last synced: 05 Apr 2026

https://github.com/vishalquantana/klav-snap

Open-core AI bug reporting & testing — right-click to file grounded bugs to Jira/Linear/GitHub/Plane, AI personas (Sims) that review your product, and self-healing end-to-end tests.

ai-agents ai-testing bug-reporting bug-tracker bun chrome-extension developer-tools end-to-end-testing feedback-widget github-issues jira linear open-core plane playwright qa screenshot self-healing-tests typescript user-personas

Last synced: 27 Jun 2026

https://github.com/glubean/skill

Agent skill for Glubean — teaches AI agents to write, run, and fix API verification in TypeScript

agent-skill ai-testing api-testing claude-code claude-code-skill claude-skill codex cursor mcp test-automation typescript

Last synced: 01 Jun 2026

https://github.com/tornikegomareli/simtouch

Send taps, swipes, and gestures to iOS Simulator from command line - built for LLM-driven testing

ai-testing send-actions send-taps simulator

Last synced: 11 Jul 2026

https://github.com/radoslaw-sz/maia

A pytest-based framework for testing multi AI agents systems. It provides a flexible and extensible platform for complex multi-agent simulations. Supports many integrations like LiteLLM, CrewAI, LangChain etc.

agentic agents ai ai-testing ai-testing-tool framework llm maia prompt-engineering prompt-testing python test

Last synced: 25 Sep 2025

https://github.com/tugkanboz/awesome-ai-testing

A curated list of AI-powered testing tools, frameworks, and resources for QA engineers. From test generation to self-healing automation, MCP-based testing, LLM evaluation, and more.

ai-test-automation ai-testing awesome awesome-list llm-testing mcp-testing qa software-testing test-automation test-generation

Last synced: 03 May 2026

https://github.com/vishalquantana/klavity

Open-core AI bug reporting & testing — right-click to file grounded bugs to Jira/Linear/GitHub/Plane, AI personas (Sims) that review your product, and self-healing end-to-end tests.

ai-agents ai-testing bug-reporting bug-tracker bun chrome-extension developer-tools end-to-end-testing feedback-widget github-issues jira linear open-core plane playwright qa screenshot self-healing-tests typescript user-personas

Last synced: 05 Jul 2026

https://github.com/chigwell/llmtestr

A new package that helps developers integration-test AI and LLM applications by validating structured outputs. It takes a user's test scenario or prompt as input, sends it to an LLM, and uses pattern

ai-powered-system-testing ai-testing automated-test-execution code-snippet-validation developer-tooling formatting-error-detection integration-testing json-schema-validation llm-validation output-consistency-enforcement output-content-verification pattern-matching prompt-driven-testing regression-detection response-format-checking schema-enforcement structured-output-verification tagged-response-validation test-automation test-scenario-input

Last synced: 14 Jan 2026

https://github.com/mustafaautomation/llm-testing-toolkit

Provider-agnostic LLM testing framework — regression, hallucination, quality, and toxicity evaluators for OpenAI, Anthropic, and custom APIs

ai ai-testing anthropic ci-cd evaluation hallucination-detection llm nlp openai qa-automation regression-testing test-automation testing toxicity-detection typescript

Last synced: 04 Apr 2026

https://github.com/ajaytester007/ai-agent-evaluation-portfolio

Includes AI agent architecture, CrewAI workflows, evaluation frameworks, and governance models.

agentic-ai ai ai-evaluation ai-governance ai-testing api automation crewai hallucination-detection human-in-the-loop json llm multi-agent-systems python qa reasoning-ai scenario-evaluation

Last synced: 18 Apr 2026

https://github.com/madgicaltechdom/playwrightmcpdemo

The purpose of this repository is to demonstrate how to generate Playwright test cases for a website using Playwright's Modern Component Patt

ai-testing ai-testing-best-practices playwright prompt-engineering test-automation

Last synced: 18 May 2026

https://github.com/mohammadshamchi/ai-react-playground

🤖 The perfect playground for testing AI-generated React components. Built for ChatGPT/Claude users to instantly test and iterate on AI-created components.

ai-testing ai-tools chatgpt claude-ai-generated-code react tailwindcss typescript vite

Last synced: 20 Apr 2026

https://github.com/dundas/thinkrun

ThinkRun — screen recorder for AI coding agents. Record a bug or session once and hand Claude Code or Cursor structured context it can act on. Real Chrome, your sessions. MCP + CLI.

ai-agent ai-coding-agent ai-screen-recorder ai-testing browser-automation bug-reporting chrome-extension claude-code cursor mcp mcp-server real-browser-mcp screen-recorder screen-recording

Last synced: 06 Jul 2026

https://github.com/javierdejesusda/checkllm

The pytest of LLM testing. Test LLM-powered applications with the same rigor as traditional software.

ai-compliance ai-safety ai-testing anthropic hallucination llm llm-evaluation openai prompt-engineering pytest rag red-teaming

Last synced: 03 May 2026

https://github.com/libraz/claude-coverwise

Make Claude write tests that actually cover every parameter interaction. Pairwise / t-wise coverage checking and generation as a Claude Code plugin, backed by the coverwise WASM engine.

ai-testing claude-code claude-code-plugin combinatorial-testing covering-array coverwise mcp mcp-server pairwise t-wise test-generation testing

Last synced: 07 Apr 2026

https://github.com/sbittla/ecommerceapp

AI Generated BDD for Java and Junit using ChatGPT4o code

ai-assitant-generated-automation ai-driven-automation ai-driven-java-junit-testing ai-driven-qa ai-driven-testing ai-generated ai-test-generator ai-testing ai-testing-best-practices ai-testing-tool chatgpt-api chatgpt-bot chatgpt-generated-api-testing cucumber-java ecommerce-application

Last synced: 02 Jul 2026

https://github.com/sritajkumarpatel/testforge

AI-powered test case generation for Azure DevOps. Feed it requirements documents, ADO work items, or plain text — three autonomous AI agents analyze, design, and produce ready-to-create test cases.

agile-testing ai-testing azure-devops bdd claude gemini gherkin llm ollama openai playwright qa-automation quality-engineering software-testing test-automation test-case-generator test-management

Last synced: 19 Jul 2026

https://github.com/mucahitgurbuz/smart-test-generator

🧠🧪 AI-powered test generation tool that automatically creates comprehensive test suites for JavaScript/TypeScript codebases.

ai-testing automated-testing cli-tool dashboard test-generation typescript

Last synced: 26 Jun 2025

https://github.com/jprando/testa-habilidade-ai-typescript

O teste de fogo para IAs geradoras de código: avaliando modelos (LLMs) contra armadilhas de concorrência e Event Loop.

ai-testing async concurrency javascript llm-benchmark promise qwen typescript v8-engine

Last synced: 19 Jun 2026

https://github.com/isatyamks/multimodal-rag

Multimodal RAG system for generating test cases and use cases from documents using hybrid retrieval, safety guards, and LLMs.

ai-testing chromadb hallucination-mitigation hybrid-search hybrid-search-technique llm ml multimodal multimodal-rag nlp prompt-safety python rag rags retrival-augmented-generation test-automation testing

Last synced: 20 Feb 2026

https://github.com/sijadev/knowledge-base

Course data & more

ai-testing data-visualization database julia python3 tensorflow

Last synced: 16 Apr 2026

https://github.com/alisher-sdet/alisher-sdet

SDET / Automation Engineer — Web, Mobile & AI-driven testing frameworks.

ai-testing appium automation-testing playwright sdet webdriverio

Last synced: 29 May 2026

https://github.com/shyinlim/test_result_dashboard_streamlit_gemini

A lightweight dashboard to view and analyze test automation results. Built with Streamlit + PostgreSQL, and powered by AI (Gemini) to help debug test failures faster.

ai-testing artificial-intelligence automation-testing docker docker-compose gemini gemini-ai llm llm-agent postgresql python quality quality-assurance software-testing streamlit test-automation testing-automation testing-tool testing-tools

Last synced: 09 Apr 2026