https://github.com/yuankai619/llm-generated-web-and-playwright-e2e-testing

Experiment about using LLM to generate web pages that meet the requirements and generate Playwright E2E test scripts.
https://github.com/yuankai619/llm-generated-web-and-playwright-e2e-testing

llm playwright prompt-engineering

Last synced: 5 months ago
JSON representation

Experiment about using LLM to generate web pages that meet the requirements and generate Playwright E2E test scripts.

Host: GitHub
URL: https://github.com/yuankai619/llm-generated-web-and-playwright-e2e-testing
Owner: Yuankai619
Created: 2024-09-11T18:34:07.000Z (10 months ago)
Default Branch: main
Last Pushed: 2024-09-23T17:38:54.000Z (10 months ago)
Last Synced: 2025-02-02T03:14:35.076Z (5 months ago)
Topics: llm, playwright, prompt-engineering
Language: TypeScript
Homepage: https://yuankai619.github.io/LLM-Generated-web-and-Playwright-E2E-Testing/
Size: 34 MB
Stars: 5
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# LLM-Generated-web-and-Playwright-E2E-Testing

## Introduction

This project is an experiment focused on using LLMs (ChatGPT 4, Claude) to generate HTML, CSS, and JavaScript code that meets user requirements through prompting.

After manually crafting Playwright test scripts by the experimenters, prompts are then given to the LLMs to generate Playwright test scripts that can fully cover all user requirements.

The manually crafted and generated Playwright scripts are compared, and based on the results, the feasibility of using LLMs to generate purely front-end web pages and automatically generate E2E test scripts is analyzed.

## Experimental setup

There will be a total of **four independent experiments** (test1~test4), with each test conducted by a different developer. Each experiment will have a different task, i.e., user requirements, consisting of web-related tasks that can be solved using HTML, CSS, and JavaScript. The tasks are provided in a PDF file located in each test folder.

The experiment will be divided into the following **three tasks**:

### 1. LLM-Generated Full Webpage Prompt Testing

**Task Description:**

Since two LLMs (ChatGPT-4, Claude) will be tested, the LLM to be tested will be designated as per the experiment.
Prompts will be given to the assigned LLM, and these prompts need to include all the requirements of the task, such as the UI design, interaction patterns, and functionality as described in the task. The prompts should follow a narrative structure similar to a user story.

Each significant improvement in results and the prompts used should be documented in the `prompts_record.md` file for later analysis. Testing will continue until the LLM-generated code meets all the requirements of the task (initially judged manually).

**Deliverables:**
- A markdown file (`prompts_record.md`) documenting each significant improvement and the prompts used.
- Webpage code (HTML, CSS, JavaScript) that meets the task's requirements.

### 2. Writing Playwright Test Scripts
**Task Description:**

Using the Playwright framework, write an E2E test for the given task in TypeScript. This test script should verify that the code generated by the LLM meets the requirements. Additionally, it is required to instruct the LLM to add `data-testid` and `aria-label` attributes to the relevant HTML tags, and these attributes should be used as selectors in the test, rather than CSS selectors (the main goal is to familiarize the participants with using Playwright for automated testing).

**Deliverables:**
- TypeScript E2E test script written using Playwright.

### 3. LLM-Generated Playwright Test Script
**Task Description:**

Prompts will be given to the assigned LLM to generate an E2E test script in TypeScript using Playwright. This test script should be able to verify that all the task's requirements are met. Similar to the first task, each significant improvement and the prompts used should be documented in `prompts_record.md` for analysis.

**Deliverables:**
- A markdown file (`prompts_record.md`) documenting each significant improvement and the prompts used.
- LLM-generated TypeScript E2E test script that meets the task's requirements.

## Experiment tester
[Yuankai619](https://github.com/Yuankai619) for `test1`

[owen0806](https://github.com/owen0806) for `test2`

[zihan0221](https://github.com/zihan0221) for `test3`

[deeveer](https://github.com/deeveer) for `test4`

## Experimental Result

Snapshots of each Playwright test script: https://yuankai619.github.io/LLM-Generated-web-and-Playwright-E2E-Testing/

| test | Does the web page meet the requirments | Whether the test script fully covers the testing |
| -------- | :--------: | :--------: |
| test1 | ✅ | ✅ |
| test2 | ✅ | ✅ |
| test3 | ✅ | ✅ |
| test4 | ✅ | ✅ |

| test | Number of prompt iterations given to the LLM for generating web | Number of prompt iterations given to the LLM for generating test script |
| -------- | :--------: | :--------: |
| test1 |4 | 5 |
| test2 | 3 | 2 |
| test3 | 3 | 5 |
| test4 | 2 | 1 |

## Conclusion

Based on the results of the four tests, we can conclude that current LLMs are capable of iteratively generating web code that meets the requirements under human supervision. Additionally, they can generate corresponding Playwright E2E testing frameworks based on the given requirements and specified frameworks. However, consistent human oversight and iteration are essential to achieve satisfactory results.

Test1 serves as a prime example. The task required "randomness," which presented a challenge for the LLM in generating tests that could effectively handle this aspect. Furthermore, if intentional delays were written into the JavaScript, the generated tests often failed to account for them correctly.

Therefore, if the goal is to have LLMs automatically generate tests based on source code and requirements, one must also consider the challenge of distinguishing errors in the tests from errors in the code itself. This is an area that requires further exploration.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/yuankai619/llm-generated-web-and-playwright-e2e-testing

Awesome Lists containing this project

README