https://github.com/vezlo/ai-validator

AI Response Validator - Automated accuracy checking, hallucination prevention, and confidence scoring for AI responses
https://github.com/vezlo/ai-validator

accuracy ai ai-response claude confidence hallucination knowlege-base llm openai rag response-validation validation validator

Last synced: 3 months ago
JSON representation

AI Response Validator - Automated accuracy checking, hallucination prevention, and confidence scoring for AI responses

Host: GitHub
URL: https://github.com/vezlo/ai-validator
Owner: vezlo
License: agpl-3.0
Created: 2025-10-02T13:37:36.000Z (3 months ago)
Default Branch: main
Last Pushed: 2025-10-02T14:13:43.000Z (3 months ago)
Last Synced: 2025-10-02T14:37:06.134Z (3 months ago)
Topics: accuracy, ai, ai-response, claude, confidence, hallucination, knowlege-base, llm, openai, rag, response-validation, validation, validator
Language: TypeScript
Homepage:
Size: 0 Bytes
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # AI Validator

[![npm version](https://img.shields.io/npm/v/@vezlo/ai-validator.svg)](https://www.npmjs.com/package/@vezlo/ai-validator)

[![License: AGPL-3.0](https://img.shields.io/badge/License-AGPL%203.0-blue.svg)](https://www.gnu.org/licenses/agpl-3.0)

**AI Response Validator** - Automated accuracy checking, hallucination prevention, and confidence scoring for AI responses.

## 🎯 Purpose

AI Validator helps you ensure the quality and reliability of AI-generated responses by:

- ✅ **Automated Accuracy Checking** - Verify AI responses against source documents

- ✅ **Hallucination Prevention** - Detect when AI invents information not in sources

- ✅ **Confidence Scoring** - Get reliability scores for every response

- ✅ **Query Classification** - Skip validation for greetings, typos, and small talk

- ✅ **Multi-LLM Support** - Works with OpenAI and Claude

Perfect for RAG systems, knowledge bases, and any application where AI response quality matters.

## 🚀 Quick Start

### Installation

```bash

npm install @vezlo/ai-validator

```

Or install globally for CLI access:

```bash

npm install -g @vezlo/ai-validator

```

### For Local Development/Testing

```bash

# Clone the repository

git clone https://github.com/vezlo/ai-validator.git

cd ai-validator

# Install dependencies

npm install

# Build the project

npm run build

# Run the test CLI

npm test

```

## 💻 Usage

### 1. CLI Testing (Interactive)

Test the validator interactively without writing code:

```bash

# Using npx (no installation required)

npx vezlo-validator-test

# Or if installed globally

vezlo-validator-test

```

The CLI will guide you through:

- Selecting LLM provider (OpenAI or Claude)

- Entering API keys

- Choosing models (any OpenAI or Claude model)

- Configuring validation settings

- Testing with your own queries and responses

- Easy text input for sources (no JSON required)

### 2. Code Usage (Programmatic)

#### Basic Example

```typescript

import { AIValidator } from '@vezlo/ai-validator';

// Initialize with your API key and provider

const validator = new AIValidator({

  openaiApiKey: 'sk-your-openai-key',  // Your OpenAI API key

  llmProvider: 'openai'                 // 'openai' or 'claude'

});

// Validate a response

const validation = await validator.validate({

  query: "What is machine learning?",

  response: "Machine learning is a subset of AI that focuses on algorithms.",

  sources: [

    {

      content: "Machine learning is a subset of artificial intelligence that focuses on algorithms and statistical models.",

      title: "ML Guide",

      url: "https://example.com/ml-guide"

    }

  ]

});

// Check results

console.log(`Confidence: ${(validation.confidence * 100).toFixed(1)}%`);

console.log(`Valid: ${validation.valid}`);

console.log(`Accuracy: ${validation.accuracy.verified ? 'Verified' : 'Not verified'}`);

console.log(`Hallucination Risk: ${(validation.hallucination.risk * 100).toFixed(1)}%`);

console.log(`Warnings: ${validation.warnings.join(', ')}`);

```

#### Advanced Configuration

```typescript

import { AIValidator } from '@vezlo/ai-validator';

const validator = new AIValidator({

  // API Keys (at least one required)

  openaiApiKey: 'sk-your-openai-key',

  claudeApiKey: 'sk-ant-your-claude-key',

  

  // LLM Provider (required)

  llmProvider: 'openai', // 'openai' or 'claude'

  

  // Model Selection (optional - you can specify any model from the provider)

  openaiModel: 'gpt-4o',  // Any OpenAI model: gpt-4o, gpt-4o-mini, gpt-4, etc.

  claudeModel: 'claude-sonnet-4-5-20250929',  // Any Claude model

  

  // Validation Settings (optional)

  confidenceThreshold: 0.7,           // 0.0 - 1.0 (default: 0.7)

  enableQueryClassification: true,     // Skip validation for greetings/typos

  enableAccuracyCheck: true,          // LLM-based accuracy checking

  enableHallucinationDetection: true  // LLM-based hallucination detection

});

```

### Integration with RAG Systems

```typescript

// Example with a RAG system

const ragResponse = await yourRAGSystem.query(userQuestion);

const sources = await yourRAGSystem.getSources(userQuestion);

const validation = await validator.validate({

  query: userQuestion,

  response: ragResponse.content,

  sources: sources.map(s => ({

    content: s.text,

    title: s.title,

    url: s.url

  }))

});

if (validation.valid) {

  // Show response to user

  return ragResponse.content;

} else {

  // Handle low confidence response

  console.warn('Low confidence response:', validation.warnings);

  return "I'm not confident about this answer. Please consult additional sources.";

}

```

## 📊 Validation Results

```typescript

interface ValidationResult {

  confidence: number;        // 0.0 - 1.0

  valid: boolean;            // true if confidence >= threshold

  accuracy: {

    verified: boolean;

    verification_rate: number;

    reason?: string;

  };

  context: {

    source_relevance: number;

    source_usage_rate: number;

    valid: boolean;

  };

  hallucination: {

    detected: boolean;

    risk: number;

    hallucinated_parts?: string[];

  };

  warnings: string[];

  query_type?: string;       // 'greeting', 'question', etc.

  skip_validation?: boolean; // true for greetings/typos

}

```

## 🔧 Configuration

### Configuration Options

All configuration is done in code when initializing the validator:

```typescript

interface AIValidatorConfig {

  // API Keys (at least one required)

  openaiApiKey?: string;      // Your OpenAI API key

  claudeApiKey?: string;       // Your Claude API key

  

  // Provider (required)

  llmProvider: 'openai' | 'claude';

  

  // Models (optional - specify any valid model from the chosen provider)

  openaiModel?: string;        // Default: 'gpt-4o'

  claudeModel?: string;        // Default: 'claude-sonnet-4-5-20250929'

  

  // Validation Settings (optional)

  confidenceThreshold?: number;           // Default: 0.7

  enableQueryClassification?: boolean;    // Default: true

  enableAccuracyCheck?: boolean;         // Default: true

  enableHallucinationDetection?: boolean; // Default: true

}

```

### Model Support

**OpenAI Models:**

You can use any OpenAI chat model by specifying it in `openaiModel`. Common choices include:

- `gpt-4o` (default, recommended)

- `gpt-4o-mini` (faster, cheaper)

- `gpt-4` (previous flagship)

- `gpt-4-turbo`

- Or any other OpenAI chat completion model

**Claude Models:**

You can use any Claude model by specifying it in `claudeModel`. Common choices include:

- `claude-sonnet-4-5-20250929` (default, Claude 4.5 Sonnet)

- `claude-opus-4-1-20250805` (Claude 4.1 Opus)

- `claude-3-7-sonnet-20250219` (Claude 3.7 Sonnet)

- Or any other Claude model identifier

The validator will work with any model supported by the respective provider's API.

### CLI Commands

```bash

# Interactive testing CLI

npx vezlo-validator-test

# Development commands

npm run build   # Build the project

npm run clean   # Clean build files

npm test        # Run the test CLI

```

## 🎯 Use Cases

### 1. RAG Systems

Validate responses against retrieved documents to ensure accuracy.

### 2. Customer Support Bots

Prevent incorrect information from reaching customers.

### 3. Knowledge Base Applications

Ensure AI answers are grounded in your documentation.

### 4. Content Generation

Validate AI-generated content against source materials.

### 5. Educational Applications

Ensure AI tutoring responses are accurate and helpful.

## ⚡ Performance

- **Validation Time**: 2-5 seconds per response (depending on LLM provider)

- **Cost**: Additional LLM API calls for validation

- **Accuracy**: High accuracy for responses with good sources

- **Reliability**: Graceful handling of edge cases

## 🔍 How It Works

1. **Query Classification** - Identifies greetings, typos, and small talk (skips validation)

2. **Accuracy Checking** - Uses LLM to verify facts against source documents

3. **Hallucination Detection** - Identifies information not present in sources

4. **Context Validation** - Ensures response relevance to the query

5. **Confidence Scoring** - Combines all metrics into a single score

## 📝 Examples

### High Confidence Response

```typescript

{

  confidence: 0.92,

  valid: true,

  accuracy: { verified: true, verification_rate: 0.95 },

  hallucination: { detected: false, risk: 0.05 },

  warnings: []

}

```

### Low Confidence Response

```typescript

{

  confidence: 0.35,

  valid: false,

  accuracy: { verified: false, verification_rate: 0.2 },

  hallucination: { detected: true, risk: 0.8 },

  warnings: ["No sources provided - high hallucination risk"]

}

```

### Skipped Validation (Greeting)

```typescript

{

  confidence: 1.0,

  valid: true,

  query_type: "greeting",

  skip_validation: true,

  warnings: []

}

```

## 🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## 📄 License

This project is dual-licensed:

- **Non-Commercial Use**: Free under AGPL-3.0 license

- **Commercial Use**: Requires a commercial license - contact us for details

See the [LICENSE](LICENSE) file for complete AGPL-3.0 license terms.

## 🆘 Support

- **Issues**: [GitHub Issues](https://github.com/vezlo/ai-validator/issues)

- **Documentation**: [GitHub Wiki](https://github.com/vezlo/ai-validator/wiki)

- **Discussions**: [GitHub Discussions](https://github.com/vezlo/ai-validator/discussions)

## 🔗 Related Projects

- [@vezlo/assistant-server](https://www.npmjs.com/package/@vezlo/assistant-server) - AI Assistant Server with RAG capabilities

- [@vezlo/src-to-kb](https://www.npmjs.com/package/@vezlo/src-to-kb) - Convert source code to knowledge base

---

**Status**: ✅ Production Ready | **Version**: 1.0.2 | **License**: AGPL-3.0 | **Node.js**: 20+

**Made with ❤️ by [Vezlo](https://vezlo.org)**

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vezlo/ai-validator

Awesome Lists containing this project

README