https://github.com/nuxt/nuxt-evals

Evals for Nuxt to test AI model competency at Nuxt.
https://github.com/nuxt/nuxt-evals

ai evals nuxt

Last synced: 3 months ago
JSON representation

Evals for Nuxt to test AI model competency at Nuxt.

Host: GitHub
URL: https://github.com/nuxt/nuxt-evals
Owner: nuxt
License: mit
Created: 2025-11-05T20:49:02.000Z (7 months ago)
Default Branch: main
Last Pushed: 2026-03-03T13:29:30.000Z (3 months ago)
Last Synced: 2026-03-05T01:47:46.482Z (3 months ago)
Topics: ai, evals, nuxt
Language: TypeScript
Homepage: https://nuxt.com/evals
Size: 658 KB
Stars: 37
Watchers: 1
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Nuxt Evals

Agent evaluations for Nuxt coding tasks, powered by [`@vercel/agent-eval`](https://www.npmjs.com/package/@vercel/agent-eval).

## Setup

```bash

pnpm install

cp .env.example .env   # requires VERCEL_OIDC_TOKEN and AI_GATEWAY_API_KEY

```

## Scripts

### `pnpm run eval`

Runs agent evaluations.

```bash

pnpm run eval                          # Run all experiments

pnpm run eval -- claude-opus-4.6       # Run a specific experiment

pnpm run eval:smoke                    # Run smoke test (1 eval per experiment)

pnpm run eval:dry                      # Preview what would run

```

### `pnpm run export-results`

Exports clean results to `agent-results.json`.

```bash

pnpm run export-results                          # Export from all experiments

pnpm run export-results -- claude-opus-4.6       # Export specific experiment

```

## Models

| Experiment | Agent | Model |

|------------|-------|-------|

| `claude-opus-4.6` | `claude-code` | `claude-opus-4-6` |

| `claude-sonnet-4.5` | `claude-code` | `claude-sonnet-4-5` |

| `claude-sonnet-4.6` | `claude-code` | `claude-sonnet-4-6` |

| `cursor-composer-1.5` | `cursor` | `composer-1.5` |

| `gemini-3-pro-preview` | `gemini` | `gemini-3-pro-preview` |

| `gemini-3.1-pro-preview` | `gemini` | `gemini-3.1-pro-preview` |

| `devstral-2` | `opencode` | `vercel/mistral/devstral-2` |

| `gpt-5.3-codex-xhigh` | `codex` | `gpt-5.3-codex-api-preview?reasoningEffort=xhigh` |

## Eval structure

Each eval is a self-contained Nuxt project in `evals/`. Most evals provide broken or suboptimal starter code that the agent must fix — the prompt describes a symptom without revealing the solution.

```

evals/nuxt-000-fix-data-fetching/

├── PROMPT.md          # task given to the agent

├── EVAL.ts            # vitest assertions (withheld from the agent)

├── package.json       # Nuxt project manifest

├── nuxt.config.ts

├── tsconfig.json

├── eslint.config.mjs

├── server/

│   └── api/

│       └── greeting.ts

└── app/

    ├── app.vue

    └── pages/

        └── index.vue  # broken starter code the agent must fix

```

| File | Purpose |

|------|---------|

| `PROMPT.md` | The task prompt sent to the agent |

| `EVAL.ts` | Test file run after the agent finishes (withheld from agent) |

| `package.json` | Must have `"type": "module"` and a `"build"` script |

| Everything else | Source files the agent can see and modify |

## Adding a new eval

1. Create a directory under `evals/` (e.g., `evals/nuxt-015-my-eval/`)

2. Add `PROMPT.md` with a vague, symptom-based task description (don't reveal the solution)

3. Add broken or suboptimal starter code in `app/` for the agent to fix

4. Add `EVAL.ts` with vitest assertions that check for the correct fix and reject anti-patterns

5. Add `package.json` with `"type": "module"` and `"build": "nuxt build"`

6. Run `pnpm run eval` — it will automatically run the new eval for all models

## Adding a new model

1. Create a config in `experiments/` (e.g., `experiments/my-model.ts`)

2. Add the display name to `MODEL_NAMES` in `scripts/export-results.ts`

3. Run `pnpm run eval` — it will automatically run all evals for the new model

## Current evals

### Nuxt (15)

| Eval | Type | Tests |

|------|------|-------|

| nuxt-000-fix-data-fetching | fix | Replace onMounted + $fetch with useFetch |

| nuxt-001-prefer-nuxt-link | fix | Replace `` with `` |

| nuxt-002-state-composables | build | State management with useState composable |

| nuxt-003-page-meta | build | Page meta, useHead, and custom layouts |

| nuxt-004-error-handling | build | Error handling with NuxtErrorBoundary |

| nuxt-005-fix-seo-meta | fix | Replace useHead meta arrays with useSeoMeta |

| nuxt-006-runtime-config | build | Runtime config with public vs private keys |

| nuxt-007-avoid-redundant-ref | fix | Replace ref + watch with computed for derived state |

| nuxt-008-fix-exposed-secret | fix | Move private runtimeConfig access to server API route |

| nuxt-009-cache-api-response | fix | Replace defineEventHandler with defineCachedEventHandler |

| nuxt-010-fix-watch-fetch | fix | Replace watch + $fetch with useFetch reactive URL |

| nuxt-011-fix-sequential-fetching | fix | Parallelize sequential await useFetch with Promise.all |

| nuxt-012-nuxt3-to-nuxt4-migration | fix | Migrate Nuxt 3 directory structure to Nuxt 4 |

| nuxt-013-prefer-nuxt-image | fix | Replace raw `` with NuxtImg + @nuxt/image |

| nuxt-014-prefer-use-cookie | fix | Replace document.cookie with useCookie composable |


### Nuxt Content (2)

| Eval | Type | Tests |

|------|------|-------|

| nuxt-content-000-navigation | build | Documentation site with queryCollectionNavigation sidebar |

| nuxt-content-001-data-collection | build | Data collection (type "data") with JSON files |

### Nuxt UI (8)

| Eval | Type | Tests |

|------|------|-------|

| nuxt-ui-000-theming | build | Theming with app.config.ts colors and semantic utilities |

| nuxt-ui-001-fix-raw-html-page | fix | Replace raw HTML with UHeader/UFooter/UPageHero/UPageSection |

| nuxt-ui-002-dashboard-layout | build | Dashboard with UDashboardGroup/Sidebar/Panel |

| nuxt-ui-003-fix-raw-form | fix | Replace raw form with UForm + Zod validation |

| nuxt-ui-004-table | build | Data table with UTable, columns, and search |

| nuxt-ui-005-modal | build | Modal overlay with UModal and v-model:open |

| nuxt-ui-006-command-palette | build | Command palette with UCommandPalette and keyboard shortcuts |

| nuxt-ui-007-dropdown-menu | build | Dropdown menu with grouped items, icons, and onSelect |

## License

See [LICENSE](LICENSE).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nuxt/nuxt-evals

Awesome Lists containing this project

README