https://github.com/raphaelpor/katt

Katt is a lightweight testing framework for running AI Evals.
https://github.com/raphaelpor/katt

agent ai cli copilot evals

Last synced: 4 months ago
JSON representation

Katt is a lightweight testing framework for running AI Evals.

Host: GitHub
URL: https://github.com/raphaelpor/katt
Owner: raphaelpor
License: mit
Created: 2026-02-12T19:06:42.000Z (4 months ago)
Default Branch: main
Last Pushed: 2026-02-18T00:58:49.000Z (4 months ago)
Last Synced: 2026-02-18T15:43:35.477Z (4 months ago)
Topics: agent, ai, cli, copilot, evals
Language: TypeScript
Homepage:
Size: 436 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 2
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Agents: AGENTS.md

Awesome Lists containing this project

README

          # Katt

[![GitHub license](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/raphaelpor/katt/blob/main/LICENSE) [![npm version](https://img.shields.io/npm/v/katt.svg?style=flat)](https://www.npmjs.com/package/katt)

Katt is a lightweight testing framework for running AI Evals, inspired by [Jest](https://github.com/jestjs/jest).



## Table of Contents

- [Overview](#overview)

- [API Documentation](#api-documentation)

- [Hello World - Example](#hello-world---example)

- [Main Features](#main-features)

- [Usage](#usage)

- [Installation](#installation)

- [Basic Usage](#basic-usage)

- [Using promptFile](#using-promptfile)

- [Specifying AI Models](#specifying-ai-models)

- [Development](#development)

- [Setup](#setup)

- [Available Scripts](#available-scripts)

- [Verification Process](#verification-process)

- [Project Structure](#project-structure)

- [How It Works](#how-it-works)

- [Requirements](#requirements)

- [License](#license)

- [Contributing](#contributing)

## Overview

Katt is designed to evaluate and validate the behavior of AI agents like **Claude Code**, **GitHub Copilot**, **OpenAI Codex** and more. It provides a simple, intuitive API for writing tests that interact with AI models and assert their responses.

## API Documentation

For a complete list of features and usage examples, see [docs/api-documentation.md](https://github.com/raphaelpor/katt/blob/main/docs/api-documentation.md).

## Hello World - Example

```typescript

import { expect, prompt } from "katt";

const result = await prompt("If you read this just say 'hello world'");

expect(result).toContain("hello world");

```

It also supports the familiar `describe` and `it` syntax for organizing tests:

```typescript

import { describe, expect, it, prompt } from "katt";

describe("Greeting agent", () => {

  it("should say hello world", async () => {

    const result = await prompt("If you read this just say 'hello world'");

    expect(result).toContain("hello world");

  });

});

```

## Main Features

- **Simple Testing API**: Familiar `describe` and `it` syntax for organizing tests

- **AI Interaction and Verification**: Built-in `prompt()`, `promptFile()` and `promptCheck()` functions for running and analyzing prompts to AI agents

- **Classification Matcher**: Built-in `toBeClassifiedAs()` matcher to grade a response against a target label on a 1-5 scale

- **Concurrent Execution**: Runs eval files concurrently for faster test execution

- **Model Selection**: Support for specifying custom AI models

- **Configurable Timeouts**: Override prompt wait time per test or via `katt.json`

## Usage

### Installation

```bash

npm install -g katt

```

### Basic Usage

1. Create a file with the `.eval.ts` or `.eval.js` extension and write your tests.

```typescript

import { expect, prompt } from "katt";

const result = await prompt("If you read this just say 'hello world'");

expect(result).toContain("hello world");

```

2. Run Katt from your project directory:

```bash

katt

```

### Using promptFile

Load prompts from external files:

```javascript

// test.eval.js

import { describe, expect, it, promptFile } from "katt";

describe("Working with files", () => {

  it("should load the file and respond", async () => {

    const result = await promptFile("./myPrompt.md");

    expect(result).toContain("expected response");

  });

});

```

### Specifying AI Models

You can specify a custom model for your prompts:

```javascript

import { describe, expect, it, prompt } from "katt";

describe("Model selection", () => {

  it("should use a specific model", async () => {

    const promptString = "You are a helpful agent. Say hi and ask what you could help the user with.";

    const result = await prompt(promptString, { model: "gpt-5.2" });

    expect(result).promptCheck("It should be friendly and helpful");

  });

});

```

You can also set a default model for the project by adding a `katt.json` file in the project root:

```json

{

  "copilot": {

    "model": "gpt-5-mini"

  },

  "prompt": {

    "timeoutMs": 240000

  }

}

```

When this file exists:

- `prompt("...")` and `promptFile("...")` use `copilot.model` by default

- `prompt("...", { model: "..." })` still overrides the config value

- `prompt.timeoutMs` sets the default wait timeout for long-running prompts

## Development

### Setup

```bash

npm install

```

### Available Scripts

- `npm run dev` - Run the CLI in development mode

- `npm run build` - Build the project

- `npm run test` - Run tests

- `npm run typecheck` - Run TypeScript type checking

- `npm run format` - Format code using Biome

- `npm run lint` - Lint code using Biome

- `npm run test:build` - Test the built CLI

### Verification Process

After making changes, run the following sequence:

1. `npm run format`

2. `npm run typecheck`

3. `npm run test`

4. `npm run build`

5. `npm run test:build`

## Project Structure

```

katt/

├── src/              # Source code

│   ├── cli/          # CLI implementation

│   ├── lib/          # Core libraries (describe, it, expect, prompt)

│   └── types/        # TypeScript type definitions

├── examples/         # Example eval files

├── specs/            # Markdown specifications

├── package.json      # Package configuration

└── tsconfig.json     # TypeScript configuration

```

## How It Works

1. Katt searches the current directory recursively for `*.eval.js` and `*.eval.ts` files

2. It skips `.git` and `node_modules` directories

3. Found eval files are imported and executed concurrently

4. Tests registered with `describe()` and `it()` are collected and run

5. Each test duration is printed after execution

6. A summary is displayed showing passed/failed tests and total duration

7. Katt exits with code `0` on success or `1` on failure

## Requirements

- Node.js

- GitHub Copilot CLI installed (see [GitHub Copilot CLI installation docs](https://docs.github.com/en/copilot/how-tos/copilot-cli/install-copilot-cli))

- Access to AI models (e.g., OpenAI API key for Codex)

## License

MIT

## Contributing

We welcome contributions from the community! Please see our [CONTRIBUTING.md](https://github.com/raphaelpor/katt/blob/main/CONTRIBUTING.md) guide for detailed information on how to contribute to Katt.

Quick start:

1. Fork the repository

2. Create a feature branch

3. Make your changes

4. Run the verification process

5. Submit a pull request

For detailed guidelines, development setup, coding standards, and more, check out our [contribution guide](https://github.com/raphaelpor/katt/blob/main/CONTRIBUTING.md).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/raphaelpor/katt

Awesome Lists containing this project

README