Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/yuankai619/llm-generated-web-and-playwright-e2e-testing
Experiment about using LLM to generate web pages that meet the requirements and generate Playwright E2E test scripts.
https://github.com/yuankai619/llm-generated-web-and-playwright-e2e-testing
llm playwright prompt-engineering selenium
Last synced: 25 days ago
JSON representation
Experiment about using LLM to generate web pages that meet the requirements and generate Playwright E2E test scripts.
- Host: GitHub
- URL: https://github.com/yuankai619/llm-generated-web-and-playwright-e2e-testing
- Owner: Yuankai619
- Created: 2024-09-11T18:34:07.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2024-09-23T17:38:54.000Z (about 1 month ago)
- Last Synced: 2024-10-12T01:22:33.471Z (25 days ago)
- Topics: llm, playwright, prompt-engineering, selenium
- Language: TypeScript
- Homepage: https://yuankai619.github.io/LLM-Generated-web-and-Playwright-E2E-Testing/
- Size: 34 MB
- Stars: 4
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# LLM-Generated-web-and-Playwright-E2E-Testing
## Introduction
This project is an experiment focused on using LLMs (ChatGPT 4, Claude) to generate HTML, CSS, and JavaScript code that meets user requirements through prompting.
After manually crafting Playwright test scripts by the experimenters, prompts are then given to the LLMs to generate Playwright test scripts that can fully cover all user requirements.
The manually crafted and generated Playwright scripts are compared, and based on the results, the feasibility of using LLMs to generate purely front-end web pages and automatically generate E2E test scripts is analyzed.
## Experimental setup
There will be a total of **four independent experiments** (test1~test4), with each test conducted by a different developer. Each experiment will have a different task, i.e., user requirements, consisting of web-related tasks that can be solved using HTML, CSS, and JavaScript. The tasks are provided in a PDF file located in each test folder.
The experiment will be divided into the following **three tasks**:
### 1. LLM-Generated Full Webpage Prompt Testing
**Task Description:**
Since two LLMs (ChatGPT-4, Claude) will be tested, the LLM to be tested will be designated as per the experiment.
Prompts will be given to the assigned LLM, and these prompts need to include all the requirements of the task, such as the UI design, interaction patterns, and functionality as described in the task. The prompts should follow a narrative structure similar to a user story.Each significant improvement in results and the prompts used should be documented in the `prompts_record.md` file for later analysis. Testing will continue until the LLM-generated code meets all the requirements of the task (initially judged manually).
**Deliverables:**
- A markdown file (`prompts_record.md`) documenting each significant improvement and the prompts used.
- Webpage code (HTML, CSS, JavaScript) that meets the task's requirements.### 2. Writing Playwright Test Scripts
**Task Description:**Using the Playwright framework, write an E2E test for the given task in TypeScript. This test script should verify that the code generated by the LLM meets the requirements. Additionally, it is required to instruct the LLM to add `data-testid` and `aria-label` attributes to the relevant HTML tags, and these attributes should be used as selectors in the test, rather than CSS selectors (the main goal is to familiarize the participants with using Playwright for automated testing).
**Deliverables:**
- TypeScript E2E test script written using Playwright.### 3. LLM-Generated Playwright Test Script
**Task Description:**Prompts will be given to the assigned LLM to generate an E2E test script in TypeScript using Playwright. This test script should be able to verify that all the task's requirements are met. Similar to the first task, each significant improvement and the prompts used should be documented in `prompts_record.md` for analysis.
**Deliverables:**
- A markdown file (`prompts_record.md`) documenting each significant improvement and the prompts used.
- LLM-generated TypeScript E2E test script that meets the task's requirements.## Experiment tester
[Yuankai619](https://github.com/Yuankai619) for `test1`[owen0806](https://github.com/owen0806) for `test2`
[zihan0221](https://github.com/zihan0221) for `test3`
[deeveer](https://github.com/deeveer) for `test4`
## Experimental Result
Snapshots of each Playwright test script: https://yuankai619.github.io/LLM-Generated-web-and-Playwright-E2E-Testing/
| test | Does the web page meet the requirments | Whether the test script fully covers the testing |
| -------- | :--------: | :--------: |
| test1 | ✅ | ✅ |
| test2 | ✅ | ✅ |
| test3 | ✅ | ✅ |
| test4 | ✅ | ✅ || test | Number of prompt iterations given to the LLM for generating web | Number of prompt iterations given to the LLM for generating test script |
| -------- | :--------: | :--------: |
| test1 |4 | 5 |
| test2 | 3 | 2 |
| test3 | 3 | 5 |
| test4 | 2 | 1 |## Conclusion
Based on the results of the four tests, we can conclude that current LLMs are capable of iteratively generating web code that meets the requirements under human supervision. Additionally, they can generate corresponding Playwright E2E testing frameworks based on the given requirements and specified frameworks. However, consistent human oversight and iteration are essential to achieve satisfactory results.
Test1 serves as a prime example. The task required "randomness," which presented a challenge for the LLM in generating tests that could effectively handle this aspect. Furthermore, if intentional delays were written into the JavaScript, the generated tests often failed to account for them correctly.
Therefore, if the goal is to have LLMs automatically generate tests based on source code and requirements, one must also consider the challenge of distinguishing errors in the tests from errors in the code itself. This is an area that requires further exploration.