https://github.com/ihatenodejs/llm-tests

My personal, web-dev focused LLM tests
https://github.com/ihatenodejs/llm-tests

llm llm-testing

Last synced: 4 months ago
JSON representation

My personal, web-dev focused LLM tests

Host: GitHub
URL: https://github.com/ihatenodejs/llm-tests
Owner: ihatenodejs
License: cc0-1.0
Created: 2025-08-10T16:23:39.000Z (10 months ago)
Default Branch: main
Last Pushed: 2025-08-10T18:43:01.000Z (10 months ago)
Last Synced: 2026-01-14T01:21:33.292Z (5 months ago)
Topics: llm, llm-testing
Language: HTML
Homepage:
Size: 61.5 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# llm-tests

My personal, web-dev focused LLM tests

## How tests are performed

I do all testing on my MacBook Pro M4, using LM Studio. I am only able to test models up to 64GB, due to memory limits. The setup may change in the future. I aim to create a consistent testing environment, by doing the following:

- Max context length is always used.
- The same prompt is used for every model.
- Prompts are written to be **unique**, and different from the samples provided by AI companies.
- If the LLM provided a structure, it is always followed. No files are combined/split to follow a certain format. Instructions (finding images/manual work) for humans are not followed.
- I never ask the model to fix it's work. If it's broken, it's used anyway. These tests are performed in one shot. If the model refuses or the output is not a website, the prompt may be re-tried.

## The prompts

For testing, I prefer to write short, unique, and entertaining prompts. This means the model is not given explicit instructions or help. I believe this makes the outputs more interesting!

I also do not prefer using LLMs for writing. This means the prompts are never AI generated (including this README!), which I believe yields better results.

## How I judge outputs

In `/test-dir/rank.md`, you will notice I may have ranked the models. But, ranking is subjective. Here's how I judge:

- Did the model fulfill all my requirements? Did it refuse all/some of the prompt?
- If I was to visit the website as an actual user, would I understand/get everything I needed?
- I prefer seeing results that are *complete*. This means a header, content, and a footer, when appropriate.

The ranking files are purely my own opinion, and no points or scoring is done.

## Test directory structure

```
test-name/
model-name/ # May contain sub directories for things like reasoning effort
raw.txt # The raw output from the model
test.xml # Test environment details
content/ # The root directory for the LLM's outputted project
[...]
```

## The inspiration

This repo is inspired by [Bijan Bowen](https://www.youtube.com/@Bijanbowen)'s videos, along with prompt ideas from his videos, my own imagination, and articles I've read.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ihatenodejs/llm-tests

Awesome Lists containing this project

README