https://github.com/pppp606/ai-comprehension-test
https://github.com/pppp606/ai-comprehension-test
Last synced: 4 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/pppp606/ai-comprehension-test
- Owner: pppp606
- License: mit
- Created: 2025-10-23T05:55:50.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-10-28T04:04:26.000Z (8 months ago)
- Last Synced: 2025-10-28T05:24:40.518Z (8 months ago)
- Language: TypeScript
- Size: 79.1 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# AI Comprehension Test
`ai-comprehension-test` is a CLI tool that evaluates how easily an AI coding agent (such as Claude Code or Codex) can understand a TypeScript codebase. The tool scans selected project files, generates AI-driven comprehension tests, executes them through a local AI agent, and summarizes the findings in either a console or JSON report.
Important: the goal is not to “make tests pass”, but to detect whether the code is easy for an AI to read accurately. Reports focus on grounded, code-backed signals rather than pure prompt compliance.
## Key Features
- **Automated project scanning** – analyzes classes, methods, and functions in your TypeScript project.
- **Multiple comprehension test types** – static analysis, stability checks (local, deterministic scoring), and AI-generated Jest tests.
- **Groundedness (AST-based)** – extracts reference facts from code (defaults/limits/normalization/constants) and compares them to the LLM’s MR to assess factual alignment.
- **AI agent orchestration** – prompts a local Claude Code or Codex instance and parses structured responses.
- **Detailed reporting** – view friendly console summaries or machine-readable JSON output with actionable insights.
## Prerequisites
- Node.js v18 or later
- npm v9 or later
- A local AI agent binary available on your `PATH` (Claude Code or Codex)
- Optional: `claude` CLI with the `-p` flag or `codex exec`
If you plan to run test-generation suites, ensure `jest` and `ts-jest` dependencies are installed (they are listed in `package.json`).
## Installation
Clone the repository and install dependencies:
```bash
git clone https://github.com/your-org/ai-comprehension-test.git
cd ai-comprehension-test
npm install
```
Build the CLI:
```bash
npm run build
```
## Environment Configuration
You can configure how the CLI invokes the AI agent and the evaluation behavior via environment variables:
| Variable | Description | Default |
|----------|-------------|---------|
| `AI_COMP_TEST_COMMAND` | Command used to call the AI agent. | `claude` |
| `AI_COMP_TEST_ARGS` | Arguments passed to the AI agent command. | `-p` |
| `AI_COMP_TEST_TIMEOUT` | Timeout in seconds for AI calls. | `180` |
| `AI_COMP_TEST_DEBUG` | Enable verbose debugging output (`true` / `false`). | `false` |
| `AI_COMP_TEST_TYPES` | Comma-separated test types to run: `static-analysis,stability,test-generation`. | (all) |
| `AI_COMP_TEST_STABILITY_ITER` | Number of responses per stability test (low for faster runs). | `5` |
| `AI_COMP_TEST_STABILITY_PROMPT_MODE` | `gamified` to enable +1/0/−1 scoring rule in prompts. | (off) |
| `AI_COMP_TEST_INCLUDE_SCHEMA` | `1` to include JSON Schema and example in prompts. | `1` |
| `AI_COMP_TEST_RECOMPUTE_COVERAGE` | `1` to recompute coverage metrics from MR in the runner. | (off) |
| `AI_COMP_TEST_AMBIGUITY` | `model|hybrid` to enable Transformers.js ambiguity detector. | (off) |
| `AI_COMP_TEST_AMBIGUITY_MODEL` | ZSC model id (e.g., `Xenova/distilbert-base-uncased-mnli`). | (default) |
| `AI_COMP_TEST_INCLUDE_AMBIGUITY_IN_SCORE` | `1` to blend model specificity into stability score. | (off) |
| `AI_COMP_TEST_AMBIGUITY_WEIGHT` | Blend weight (0–0.5). | `0.15` |
For example, to run against Codex:
```bash
export AI_COMP_TEST_COMMAND=codex
export AI_COMP_TEST_ARGS="exec"
```
### Using .env
This CLI loads environment variables from a `.env` file in the current working directory. An example template is provided as `.env.example`.
Example `.env`:
```
AI_COMP_TEST_COMMAND=claude
AI_COMP_TEST_ARGS=-p
AI_COMP_TEST_TIMEOUT=180
# AI_COMP_TEST_WORKDIR=/absolute/path/to/workdir
```
Notes:
- `AI_COMP_TEST_ARGS` is split by spaces. Prefer `--flag=value` for values containing spaces.
- Shell environment variables override values from `.env`.
- The `.env` file should be placed in the directory where you run the CLI.
## Usage
The CLI exposes a single `run` command. Invoke it from the root of the project you want to analyze:
```bash
npx ai-comprehension-test run [project-path]
```
### Options
- `--files ` – Comma-separated list of files to limit the scan, e.g. `--files "src/UserService.ts,src/PaymentService.ts"`.
- `--output ` – Directory for generated artifacts (defaults to `.ai-comp-test`).
- `--format ` – Output format: `console` (default) or `json`.
- `--json-file ` – When `--format json`, the output filename (default: `results.json`).
- `--verbose` – Print detailed logs during scanning, AI calls, and reporting.
- `--help` – Display help information.
### Examples
Run the full suite against the current project:
```bash
npx ai-comprehension-test run
```
Analyze a subset of files and produce a JSON report:
```bash
npx ai-comprehension-test run \
--files "src/services/UserService.ts" \
--format json \
--output ./results \
--verbose
```
Execute against another project directory:
```bash
npx ai-comprehension-test run ../another-project
```
### Output
- **Console report** – Summaries, tables of test outcomes, and highlighted critical issues.
- **JSON report** – Structured machine-readable output saved to `/results.json` (or `--json-file`) when `--format json` is specified.
## Stability and Groundedness
### Stability (consistency)
Stability judgment is local and deterministic:
- Parses each MR response (with code‑fence and loose-object recovery).
- Vectorizes per-field text using TF‑IDF; computes average pairwise cosine similarity per field; aggregates to a 0–100 `consistencyScore` and `consistencyLevel`.
- Derives `mainIdea`, `variations`, `reasoning`, and `codeClarity`.
- Optional: embedding/ZSC ambiguity integration via `@xenova/transformers`.
### Groundedness (fact alignment)
The runner extracts a Reference MR from code via AST and compares it to the LLM MR:
- Extracts defaults (`this.x=…`, `??`), limits (nested `Math.min/max`), normalization (s→ms), and constants.
- Computes `groundedness.score = factCoverage` (0–100) and lists `mismatches`.
- Recomputes `coverage.schemaCoverage/specificity` from the parsed MR to avoid under-reporting.
This makes “AI comprehension” observable as factual alignment to code, not only prompt adherence.
### Prompt Schema and Example
Prompts include the full MR schema (see `schemas/mr.schema.json`) and a concrete example. Keys must be present even when unknown; do not guess—use `unknowns` with reasons. Ambiguous language is disallowed.
This removes one LLM call from the Stability flow and improves reproducibility.
### Embedding Mode (Optional)
You can switch to an embedding-based scorer for better synonym/phrasing robustness:
- Set `AI_COMP_TEST_STABILITY_MODE=embedding`
- Optional: `AI_COMP_TEST_EMBED_MODEL` to override the default `Xenova/all-mpnet-base-v2`.
- On first run, the model is downloaded (network required). Subsequent runs are cached.
The embedding scorer creates sentence embeddings per field using `@xenova/transformers` (mean pooled) and uses pairwise cosine similarity, then aggregates like the default mode.
Notes:
- Default embedding model: `Xenova/all-mpnet-base-v2` (higher accuracy; heavier).
- Lighter alternative: `Xenova/all-MiniLM-L6-v2` (faster; slightly lower accuracy).
## Development Workflow
Start the CLI in development mode with `tsx`:
```bash
npm run dev -- run --help
```
Run TypeScript compilation checks:
```bash
npm run build
```
Execute unit tests (if present, with Jest):
```bash
npm test
```
> **Note:** Initial dependency installation may require network access to fetch packages such as `@types/jest`.
## Project Structure
```
ai-comprehension-test/
├── src/
│ ├── cli/ # CLI entry point and command wiring
│ ├── core/ # AI agent client, test generation, and runners
│ ├── reporter/ # Console and JSON reporters
│ ├── scanner/ # TypeScript AST scanner and type definitions
│ ├── templates/ # Prompt templates for each test type
│ └── types/ # Shared TypeScript interfaces
├── jest.config.js # Jest configuration for generated tests
├── package.json # Dependencies and scripts
├── tsconfig.json # TypeScript configuration
└── README.md # Project documentation
```
## Troubleshooting
- **AI command not found:** Ensure `claude` or `codex` is installed locally and available on your `PATH`.
- **Timeouts during AI calls:** Increase `AI_COMP_TEST_TIMEOUT` or verify that the agent is responsive.
- **Jest test failures:** Review the generated test files in the output directory and rerun with `--verbose` for additional logs.
## License
This project is licensed under the MIT License. See [LICENSE](LICENSE) for details.