An open API service indexing awesome lists of open source software.

https://github.com/frankbria/iris

Leveraging AI vision to enable UI testing and development
https://github.com/frankbria/iris

ai ai-assistant ai-tools claude-code cli cli-app cli-application development interface interface-builder interface-design openai-codex openai-codex-cli vision

Last synced: 16 days ago
JSON representation

Leveraging AI vision to enable UI testing and development

Awesome Lists containing this project

README

          

# IRIS - Interface Recognition & Interaction Suite

[![Follow on X](https://img.shields.io/twitter/follow/FrankBria18044?style=social)](https://x.com/FrankBria18044)

> 👁️ AI-powered UI understanding and testing toolkit

**Phase 1: ✅ Complete** | **Phase 2: 🟡 75% Complete (CLI, Accessibility, AI Vision Foundation)**

IRIS gives AI coding assistants "eyes and hands" to see and interact with user interfaces through natural language commands, visual regression testing, and accessibility validation.

---

## Current Status

### ✅ Phase 1 - Complete (Production-Ready)

**Core Features Available:**
- ✅ Natural language UI commands with AI translation
- ✅ Browser automation via Playwright
- ✅ File watching with automatic re-execution
- ✅ JSON-RPC protocol for AI coding assistant integration
- ✅ SQLite persistence for test runs and results
- ✅ Multi-provider AI support (OpenAI/Anthropic/Ollama)

### ✅ Phase 2 - Visual Regression & Accessibility (COMPLETE)

**Status:** Production-ready with 95.9% test pass rate and comprehensive feature coverage

**Visual Testing Core:**
- ✅ Visual capture engine with page stabilization and masking
- ✅ SSIM and pixel-based diff engine with region analysis
- ✅ Git-integrated baseline management (branch/commit/timestamp strategies)
- ✅ Multi-device testing (desktop, tablet, mobile)
- ✅ Complete TypeScript/Zod type system

**AI Vision Integration:**
- ✅ AI-powered semantic analysis (OpenAI GPT-4o, Claude 3.5 Sonnet, Ollama)
- ✅ Multimodal AI client architecture (src/ai-client/ - reusable for future AI vision tasks)
- ✅ Image preprocessing pipeline (resize, optimize, base64 encoding)
- ✅ AI vision result caching (LRU memory + SQLite persistence)
- ✅ Cost tracking with budget management and circuit breaker
- ✅ Smart client with automatic fallback and cost optimization

**CLI & Reporting:**
- ✅ CLI commands: `iris visual-diff` and `iris a11y`
- ✅ Multi-format reporting (HTML, JSON, JUnit, Markdown)
- ✅ Visual reporter with diff viewer and interactive HTML reports

**Accessibility Testing:**
- ✅ WCAG 2.1 Level AA/AAA compliance validation with axe-core
- ✅ Keyboard navigation testing (Tab order, focus traps, arrow keys)
- ✅ Screen reader simulation (ARIA labels, landmarks, headings)

**Examples & Documentation:**
- ✅ 4 example projects (basic visual, multi-device, accessibility, CI/CD)
- ✅ Comprehensive API documentation and user guides
- ✅ CI/CD integration examples

**Test Results:** 541/564 tests passing (95.9% pass rate)
- 1 non-critical performance test timing failure (easily fixable)
- 22 accessibility E2E tests skipped due to infrastructure mismatch

**Coverage:** 75.49% overall (below 85% target)
- Visual module: 88.3%
- Accessibility module: 76.6%
- Database: 95.74%
- Branch coverage: 58.28% (primary improvement area)

**Production Readiness:** ✅ Ready for use with noted optimization opportunities

---

## Quick Start

### Installation

```bash
git clone https://github.com/frankbria/iris.git
cd iris
npm install
npm run build
npm link
```

### Verify Installation

```bash
iris --version
```

### Try the Demo (Fastest Way)

```bash
bash <(curl -s https://raw.githubusercontent.com/frankbria/iris/main/scripts/demo-setup.sh)
```

This creates a sample project, runs visual and accessibility tests, and generates reports automatically.

### Basic Usage

**Natural Language Commands:**
```bash
# Execute browser actions with natural language
iris run "click #submit-button"
iris run "fill #email with user@example.com"
iris run "navigate to https://example.com"

# AI-powered complex commands (requires API key)
export OPENAI_API_KEY=sk-your-key
iris run "find the blue button next to the search box and click it"
```

**Visual Regression Testing:**
```bash
# Compare current page against baseline
iris visual-diff \
--pages "http://localhost:8080/**/*.html" \
--baseline main \
--devices desktop,tablet,mobile \
--threshold 0.1 \
--format html

# Enable AI semantic analysis
iris visual-diff \
--pages "http://localhost:8080/" \
--semantic \
--threshold 0.1
```

**Accessibility Testing:**
```bash
# Run WCAG 2.1 AA compliance tests
iris a11y \
--pages "http://localhost:8080/**/*.html" \
--tags wcag2a,wcag2aa \
--include-keyboard \
--format html

# Test with screen reader simulation
iris a11y \
--pages "http://localhost:8080/" \
--include-screenreader \
--fail-on critical,serious
```

**File Watching:**
```bash
# Watch files and auto-execute on changes
iris watch src/ --instruction "reload page"
iris watch "**/*.ts" --execute
```

**JSON-RPC Server:**
```bash
# Start WebSocket server for AI coding assistant integration
iris connect
iris connect 8080 # Custom port
```

---

## Configuration

### AI Provider Setup

**OpenAI (Recommended for Visual Analysis):**
```bash
export OPENAI_API_KEY=sk-your-key
```

**Anthropic Claude (Recommended for Semantic Analysis):**
```bash
export ANTHROPIC_API_KEY=sk-ant-your-key
```

**Local Ollama (Privacy-Focused):**
```bash
export OLLAMA_ENDPOINT=http://localhost:11434
export OLLAMA_MODEL=llava:latest
```

### Config File

Create `~/.iris/config.json`:
```json
{
"ai": {
"provider": "openai",
"model": "gpt-4o-mini"
},
"visual": {
"threshold": 0.1,
"devices": ["desktop"],
"aiProvider": "openai"
},
"accessibility": {
"wcagLevel": "AA",
"includeKeyboard": true
},
"watch": {
"patterns": ["**/*.{ts,tsx,js,jsx}"],
"debounceMs": 1000
}
}
```

### Project-Level Config

Create `.irisrc` in your project root:
```json
{
"visual": {
"threshold": 0.1,
"devices": ["desktop", "tablet", "mobile"],
"capture": {
"waitForFonts": true,
"disableAnimations": true,
"stabilizationDelay": 500
}
},
"accessibility": {
"wcagLevel": "AA",
"includeKeyboard": true
}
}
```

---

## Visual Regression Testing

### Features

**Capture Engine:**
- Screenshot capture with viewport/fullPage modes
- Multi-device support (desktop 1920x1080, tablet 768x1024, mobile 375x667)
- Page stabilization (fonts, animations, network idle)
- Dynamic content masking
- Element-specific capture

**Diff Engine:**
- Pixel-level comparison with pixelmatch
- SSIM (Structural Similarity Index) analysis
- Region-based difference detection
- Change classification (layout/content/styling/animation)

**AI Semantic Analysis:**
- OpenAI GPT-4 Vision integration
- Anthropic Claude 3.5 Sonnet support
- Ollama local model support
- Semantic change understanding (intentional vs regression)
- Severity classification (breaking, moderate, minor)
- Confidence scoring and explanations

**Baseline Management:**
- Git-integrated baseline storage
- Branch-based baseline strategies
- Commit-based snapshots
- Timestamp-based baselines
- Automatic cleanup of old baselines

**Reporting:**
- Interactive HTML reports with diff viewer
- JSON structured data export
- JUnit XML for CI/CD integration
- Markdown summary reports

### CLI Options

```bash
iris visual-diff [options]

Options:
--pages Page patterns (comma-separated, default: /)
--baseline Baseline branch/commit (default: main)
--semantic Enable AI semantic analysis
--threshold Pixel threshold 0-1 (default: 0.1)
--devices Devices: desktop,tablet,mobile (default: desktop)
--format Output: html|json|junit|markdown (default: html)
--output Output file path
--fail-on Fail on: minor|moderate|breaking (default: breaking)
--update-baseline Update baseline with current screenshots
--mask CSS selectors to mask (comma-separated)
--concurrency Max concurrent comparisons (default: 3)
```

---

## Accessibility Testing

### Features

**WCAG Compliance:**
- WCAG 2.0/2.1 Level A, AA, AAA validation
- axe-core integration with 90+ rules
- Configurable rule sets and tags
- Impact-based severity classification

**Keyboard Navigation:**
- Tab order validation
- Focus trap detection
- Arrow key navigation testing
- Escape key handling verification
- Custom keyboard sequence testing

**Screen Reader Support:**
- ARIA label validation
- Landmark navigation testing
- Heading structure verification
- Image alt text validation
- Screen reader simulation

**Reporting:**
- Accessibility score (0-100 scale)
- Violation breakdown by severity
- Element-level issue reporting
- Remediation suggestions

### CLI Options

```bash
iris a11y [options]

Options:
--pages Page patterns (comma-separated, default: /)
--rules Specific axe rules (comma-separated)
--tags Rule tags: wcag2a,wcag2aa,wcag21aa (default: wcag2a,wcag2aa)
--fail-on Impact levels: critical,serious,moderate,minor (default: critical,serious)
--format Output: html|json|junit (default: html)
--output Output file path
--include-keyboard Include keyboard navigation tests (default: true)
--include-screenreader Include screen reader simulation
```

---

## Examples

Pre-built examples are available in the `examples/` directory:

### 1. Basic Visual Testing
```bash
cd examples/basic-visual-test
./test-visual.sh
```

Demonstrates:
- Simple page comparison
- Baseline creation and updating
- Threshold configuration
- HTML report generation

### 2. Multi-Device Testing
```bash
cd examples/multi-device-visual
./test-responsive.sh
```

Demonstrates:
- Desktop, tablet, mobile testing
- Responsive design validation
- Device-specific baselines
- Parallel test execution

### 3. Accessibility Audit
```bash
cd examples/accessibility-audit
./test-a11y.sh
```

Demonstrates:
- WCAG 2.1 AA compliance testing
- Keyboard navigation validation
- Screen reader simulation
- Accessibility score reporting

### 4. CI/CD Integration
```bash
cd examples/ci-cd-integration
```

Includes configurations for:
- GitHub Actions
- GitLab CI
- Jenkins
- CircleCI

---

## Development

### Run Tests

```bash
npm test
# Result: 541/564 passing (95.9% pass rate)
# 1 failing (performance timing - non-critical)
# 22 skipped (accessibility E2E - infrastructure mismatch)
```

### Build

```bash
npm run build
```

### Coverage

```bash
npm test -- --coverage
# Overall: 75.49% (below 85% target)
# Visual: 88.3% | A11y: 76.6% | Database: 95.74%
# Branch coverage: 58.28% (primary improvement area)
```

### Run Benchmarks

```bash
npm run bench
```

Performance baselines:
- Single page visual diff: 42.61ms (target <100ms) ✅
- 4K image processing: 205.30ms (target <300ms) ✅
- Memory delta: 1.57MB ✅

---

## Architecture

### Phase 1 Core (9 modules, 25,667+ lines)

**CLI Framework** (`src/cli.ts`)
- Commander.js-based CLI with 5 commands
- Browser execution integration
- Configuration management

**Browser Automation** (`src/browser.ts`, `src/executor.ts`)
- Playwright wrapper with retry logic
- Action execution with error handling
- Session management

**AI Translation** (`src/translator.ts`, `src/ai-client.ts`)
- Pattern matching + AI fallback
- Multi-provider support (OpenAI/Anthropic/Ollama)
- Confidence scoring

**Protocol & Storage** (`src/protocol.ts`, `src/db.ts`)
- JSON-RPC 2.0 over WebSocket
- SQLite persistence with migration system
- Test result tracking with visual and a11y results

### Phase 2 Visual & Accessibility (100% Complete)

**Visual Module** (`src/visual/`)
- `visual-runner.ts` - Test orchestration (15,365 bytes)
- `capture.ts` - Screenshot capture with stabilization
- `diff.ts` - Pixel and SSIM comparison
- `baseline.ts` - Git-integrated baseline management
- `ai-classifier.ts` - AI semantic analysis (6,843 bytes)
- `reporter.ts` - Multi-format reporting (979 lines)
- `storage.ts` - Artifact storage

**Accessibility Module** (`src/a11y/`)
- `a11y-runner.ts` - Test orchestration (12,799 bytes)
- `axe-integration.ts` - WCAG compliance (6,279 bytes)
- `keyboard-tester.ts` - Keyboard navigation (12,271 bytes)

**Database** (`src/db.ts`)
- Extended schema with visual_test_results and a11y_test_results tables
- Migration system for schema versioning
- Aggregate statistics and query functions

---

## Documentation

### Getting Started
- **[docs/GETTING_STARTED_GUIDE.md](docs/GETTING_STARTED_GUIDE.md)** - Complete setup guide (5-minute quick start, 20-minute full setup)
- **[docs/QUICKSTART.md](docs/QUICKSTART.md)** - 5-minute introduction

### API Reference
- **[docs/api/visual-testing.md](docs/api/visual-testing.md)** - Visual regression API (1,116 lines)
- **[docs/api/accessibility-testing.md](docs/api/accessibility-testing.md)** - Accessibility API (1,050 lines)

### Guides
- **[docs/guides/ci-cd-integration.md](docs/guides/ci-cd-integration.md)** - CI/CD integration (645 lines)
- **[docs/PERFORMANCE.md](docs/PERFORMANCE.md)** - Performance benchmarks and optimization
- **[docs/OPTIMIZATION_RECOMMENDATIONS.md](docs/OPTIMIZATION_RECOMMENDATIONS.md)** - Optimization strategies

### Development
- **[docs/DEVELOPMENT_INSTRUCTIONS.md](docs/DEVELOPMENT_INSTRUCTIONS.md)** - Development guide
- **[docs/phase2_technical_architecture.md](docs/phase2_technical_architecture.md)** - Phase 2 architecture (2,556 lines)
- **[docs/PROJECT_INDEX.md](docs/PROJECT_INDEX.md)** - Project navigation

### Contributing
- **[plan/READY_FOR_COMMIT.md](plan/READY_FOR_COMMIT.md)** - Git workflow guide
- **[docs/GIT_COMMIT_GUIDE.md](docs/GIT_COMMIT_GUIDE.md)** - Commit instructions
- **[plan/phase2_completion_report.md](plan/phase2_completion_report.md)** - Phase 2 completion report

### AI Agents
- **[AGENT_INSTRUCTIONS.md](AGENT_INSTRUCTIONS.md)** - Development guidance
- **[CLAUDE.md](CLAUDE.md)** - Claude Code instructions
- **[docs/beads-migration-guide.md](docs/beads-migration-guide.md)** - Beads issue tracker guide

### Issue Tracking with Beads

IRIS uses **Beads** (`bd`) - a dependency-aware issue tracker designed for AI-supervised workflows. Issues are tracked with explicit dependency chains, making it easy for AI agents to find ready work and avoid duplicating effort.

**Quick Start:**
```bash
# Show unblocked issues ready to work on
bd ready

# View issue details
bd show iris-7

# Claim work
bd update iris-7 --status in_progress --assignee your-name

# Close when complete
bd close iris-7 --reason "commit abc123"
```

**Current Status:**
- **19 issues** tracking Phase 2 Sub-Phases B-E (weeks 5-18)
- **10 issues** ready with no blockers
- **Critical path**: iris-6 → iris-7 (P0 validation) → iris-8 → ... → iris-16

**Key Features:**
- Dependency tracking (`blocks`, `parent-child`, `discovered-from`)
- Auto-sync with git (JSONL export/import)
- Priority-based work queues (P0-P3)
- JSON output for programmatic access

See **[docs/beads-migration-guide.md](docs/beads-migration-guide.md)** for complete workflow documentation.

---

## Roadmap

### Phase 1 ✅ (Complete - September 2024)
- CLI framework with natural language commands
- Browser automation with Playwright
- File watching and auto-execution
- AI translation with multi-provider support
- JSON-RPC protocol server
- SQLite persistence

### Phase 2 ✅ (COMPLETE - October 2025)
- ✅ Visual regression testing with pixel and SSIM comparison
- ✅ AI semantic analysis (OpenAI, Claude, Ollama)
- ✅ AI vision foundation with cost control and caching
- ✅ Multi-device testing (desktop, tablet, mobile)
- ✅ Accessibility validation (WCAG 2.1 AA/AAA)
- ✅ Keyboard navigation and screen reader testing
- ✅ Git-integrated baseline management
- ✅ Multi-format reporting (HTML, JSON, JUnit, Markdown)
- ✅ CLI integration (`iris visual-diff`, `iris a11y`)
- ✅ E2E integration tests
- ✅ Performance benchmarks
- ✅ Comprehensive documentation and examples
- ✅ CI/CD ready
- ✅ Test suite stabilized (95.9% pass rate)
- ⚠️ Coverage at 75.49% (below 85% target - branch coverage improvement needed)

### Phase 3 📋 (Planned - Q1 2026)
- Performance monitoring and Core Web Vitals
- Advanced AI-powered visual analysis
- Autonomous UI exploration
- Design system compliance checking
- Visual regression history and trends
- Team collaboration features

---

## Testing

**Test Coverage:**
- Total: 564 tests (541 passing, 95.9% pass rate)
- Failing: 1 (non-critical performance timing test)
- Skipped: 22 (accessibility E2E infrastructure mismatch)
- Overall coverage: 75.49% (target: 85%)
- Visual module: 88.3%
- Accessibility module: 76.6%
- Database: 95.74%
- Branch coverage: 58.28% (primary improvement opportunity)

**Test Suites:**
- Unit tests for all core modules (541 passing)
- Integration tests for CLI commands
- E2E tests: Visual (93.3% passing), Accessibility (0% - skipped)
- Browser automation tests with real Playwright
- Performance benchmarks

---

## Dependencies

**Core:**
- Node.js >=18.0.0
- TypeScript 5.1.6
- Playwright 1.35.0
- Commander 11.0.0

**Visual Testing:**
- sharp (image processing)
- pixelmatch (pixel diff)
- image-ssim (structural similarity)
- simple-git (baseline management)
- openai (GPT-4 Vision)
- @anthropic-ai/sdk (Claude)

**Accessibility:**
- @axe-core/playwright
- pa11y

**Utilities:**
- zod (runtime validation)
- better-sqlite3 (database)
- ws (WebSocket)

---

## Performance

**Benchmarks (October 2025):**
- Single page visual diff: **42.61ms** (target <100ms) ✅ 57% better
- 4K image processing: **205.30ms** (target <300ms) ✅ 32% better
- Memory usage: **1.57MB delta** ✅ Excellent
- Parallel efficiency: 1.6x (roadmap for 3-5x improvement)

See [docs/PERFORMANCE.md](docs/PERFORMANCE.md) for detailed benchmarks.

---

## CI/CD Integration

IRIS is CI/CD ready with:
- Exit code propagation for pass/fail
- JUnit XML report generation
- JSON structured output
- Parallel test execution
- Configurable failure thresholds

**Example GitHub Actions:**
```yaml
- name: Visual Regression Testing
run: |
iris visual-diff \
--pages "http://localhost:8080/**/*.html" \
--baseline main \
--format junit \
--output test-results/visual.xml

- name: Accessibility Testing
run: |
iris a11y \
--pages "http://localhost:8080/**/*.html" \
--format junit \
--output test-results/a11y.xml
```

See [docs/guides/ci-cd-integration.md](docs/guides/ci-cd-integration.md) for complete examples.

---

## Contributing

Phase 2 is complete. The project is ready for Phase 3 development or community contributions.

**Areas for Contribution:**
- Additional AI provider integrations
- Enhanced report visualizations
- Performance optimizations
- Additional accessibility rules
- Documentation improvements
- Example projects

See [DEVELOPMENT_INSTRUCTIONS.md](docs/DEVELOPMENT_INSTRUCTIONS.md) for contribution guidelines.

---

## License

MIT

---

## Links

- **GitHub:** [github.com/frankbria/iris](https://github.com/frankbria/iris)
- **Issues:** [github.com/frankbria/iris/issues](https://github.com/frankbria/iris/issues)
- **Twitter:** [@FrankBria18044](https://twitter.com/FrankBria18044)

Building in public. Star the repo to follow along! ⭐

---

## Quick Reference

**Installation:**
```bash
npm install -g @frankbria/iris # Coming soon to npm
# Or install from source:
git clone https://github.com/frankbria/iris.git && cd iris && npm install && npm run build && npm link
```

**Visual Testing:**
```bash
iris visual-diff --pages "http://localhost:8080/" --semantic
```

**Accessibility Testing:**
```bash
iris a11y --pages "http://localhost:8080/" --include-keyboard
```

**Get Help:**
```bash
iris --help
iris visual-diff --help
iris a11y --help
```

**Documentation:**
- Quick Start: [docs/GETTING_STARTED_GUIDE.md](docs/GETTING_STARTED_GUIDE.md)
- API Reference: [docs/api/](docs/api/)
- Examples: [examples/](examples/)

**Status:**
- Phase 1: ✅ Complete
- Phase 2: ✅ Complete (production-ready)
- Tests: 541/564 passing (95.9%)
- Coverage: 75.49% (below 85% target)
- Production Ready: ✅ Yes (with noted optimization opportunities)