https://github.com/nuxt/nuxt-evals
Evals for Nuxt to test AI model competency at Nuxt.
https://github.com/nuxt/nuxt-evals
ai evals nuxt
Last synced: 3 months ago
JSON representation
Evals for Nuxt to test AI model competency at Nuxt.
- Host: GitHub
- URL: https://github.com/nuxt/nuxt-evals
- Owner: nuxt
- License: mit
- Created: 2025-11-05T20:49:02.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2026-03-03T13:29:30.000Z (3 months ago)
- Last Synced: 2026-03-05T01:47:46.482Z (3 months ago)
- Topics: ai, evals, nuxt
- Language: TypeScript
- Homepage: https://nuxt.com/evals
- Size: 658 KB
- Stars: 37
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Nuxt Evals
Agent evaluations for Nuxt coding tasks, powered by [`@vercel/agent-eval`](https://www.npmjs.com/package/@vercel/agent-eval).
## Setup
```bash
pnpm install
cp .env.example .env # requires VERCEL_OIDC_TOKEN and AI_GATEWAY_API_KEY
```
## Scripts
### `pnpm run eval`
Runs agent evaluations.
```bash
pnpm run eval # Run all experiments
pnpm run eval -- claude-opus-4.6 # Run a specific experiment
pnpm run eval:smoke # Run smoke test (1 eval per experiment)
pnpm run eval:dry # Preview what would run
```
### `pnpm run export-results`
Exports clean results to `agent-results.json`.
```bash
pnpm run export-results # Export from all experiments
pnpm run export-results -- claude-opus-4.6 # Export specific experiment
```
## Models
| Experiment | Agent | Model |
|------------|-------|-------|
| `claude-opus-4.6` | `claude-code` | `claude-opus-4-6` |
| `claude-sonnet-4.5` | `claude-code` | `claude-sonnet-4-5` |
| `claude-sonnet-4.6` | `claude-code` | `claude-sonnet-4-6` |
| `cursor-composer-1.5` | `cursor` | `composer-1.5` |
| `gemini-3-pro-preview` | `gemini` | `gemini-3-pro-preview` |
| `gemini-3.1-pro-preview` | `gemini` | `gemini-3.1-pro-preview` |
| `devstral-2` | `opencode` | `vercel/mistral/devstral-2` |
| `gpt-5.3-codex-xhigh` | `codex` | `gpt-5.3-codex-api-preview?reasoningEffort=xhigh` |
## Eval structure
Each eval is a self-contained Nuxt project in `evals/`. Most evals provide broken or suboptimal starter code that the agent must fix — the prompt describes a symptom without revealing the solution.
```
evals/nuxt-000-fix-data-fetching/
├── PROMPT.md # task given to the agent
├── EVAL.ts # vitest assertions (withheld from the agent)
├── package.json # Nuxt project manifest
├── nuxt.config.ts
├── tsconfig.json
├── eslint.config.mjs
├── server/
│ └── api/
│ └── greeting.ts
└── app/
├── app.vue
└── pages/
└── index.vue # broken starter code the agent must fix
```
| File | Purpose |
|------|---------|
| `PROMPT.md` | The task prompt sent to the agent |
| `EVAL.ts` | Test file run after the agent finishes (withheld from agent) |
| `package.json` | Must have `"type": "module"` and a `"build"` script |
| Everything else | Source files the agent can see and modify |
## Adding a new eval
1. Create a directory under `evals/` (e.g., `evals/nuxt-015-my-eval/`)
2. Add `PROMPT.md` with a vague, symptom-based task description (don't reveal the solution)
3. Add broken or suboptimal starter code in `app/` for the agent to fix
4. Add `EVAL.ts` with vitest assertions that check for the correct fix and reject anti-patterns
5. Add `package.json` with `"type": "module"` and `"build": "nuxt build"`
6. Run `pnpm run eval` — it will automatically run the new eval for all models
## Adding a new model
1. Create a config in `experiments/` (e.g., `experiments/my-model.ts`)
2. Add the display name to `MODEL_NAMES` in `scripts/export-results.ts`
3. Run `pnpm run eval` — it will automatically run all evals for the new model
## Current evals
### Nuxt (15)
| Eval | Type | Tests |
|------|------|-------|
| nuxt-000-fix-data-fetching | fix | Replace onMounted + $fetch with useFetch |
| nuxt-001-prefer-nuxt-link | fix | Replace `` with `` |
| nuxt-002-state-composables | build | State management with useState composable |
| nuxt-003-page-meta | build | Page meta, useHead, and custom layouts |
| nuxt-004-error-handling | build | Error handling with NuxtErrorBoundary |
| nuxt-005-fix-seo-meta | fix | Replace useHead meta arrays with useSeoMeta |
| nuxt-006-runtime-config | build | Runtime config with public vs private keys |
| nuxt-007-avoid-redundant-ref | fix | Replace ref + watch with computed for derived state |
| nuxt-008-fix-exposed-secret | fix | Move private runtimeConfig access to server API route |
| nuxt-009-cache-api-response | fix | Replace defineEventHandler with defineCachedEventHandler |
| nuxt-010-fix-watch-fetch | fix | Replace watch + $fetch with useFetch reactive URL |
| nuxt-011-fix-sequential-fetching | fix | Parallelize sequential await useFetch with Promise.all |
| nuxt-012-nuxt3-to-nuxt4-migration | fix | Migrate Nuxt 3 directory structure to Nuxt 4 |
| nuxt-013-prefer-nuxt-image | fix | Replace raw `
` with NuxtImg + @nuxt/image |
| nuxt-014-prefer-use-cookie | fix | Replace document.cookie with useCookie composable |
### Nuxt Content (2)
| Eval | Type | Tests |
|------|------|-------|
| nuxt-content-000-navigation | build | Documentation site with queryCollectionNavigation sidebar |
| nuxt-content-001-data-collection | build | Data collection (type "data") with JSON files |
### Nuxt UI (8)
| Eval | Type | Tests |
|------|------|-------|
| nuxt-ui-000-theming | build | Theming with app.config.ts colors and semantic utilities |
| nuxt-ui-001-fix-raw-html-page | fix | Replace raw HTML with UHeader/UFooter/UPageHero/UPageSection |
| nuxt-ui-002-dashboard-layout | build | Dashboard with UDashboardGroup/Sidebar/Panel |
| nuxt-ui-003-fix-raw-form | fix | Replace raw form with UForm + Zod validation |
| nuxt-ui-004-table | build | Data table with UTable, columns, and search |
| nuxt-ui-005-modal | build | Modal overlay with UModal and v-model:open |
| nuxt-ui-006-command-palette | build | Command palette with UCommandPalette and keyboard shortcuts |
| nuxt-ui-007-dropdown-menu | build | Dropdown menu with grouped items, icons, and onSelect |
## License
See [LICENSE](LICENSE).