{"id":26275101,"url":"https://github.com/onekq/WebApp1k","last_synced_at":"2025-03-14T10:01:22.099Z","repository":{"id":249583928,"uuid":"831777058","full_name":"onekq/WebApp1k","owner":"onekq","description":"WebApp1k benchmark","archived":false,"fork":false,"pushed_at":"2025-03-06T17:40:54.000Z","size":648,"stargazers_count":8,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-06T18:35:55.654Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/onekq.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-21T15:45:57.000Z","updated_at":"2025-03-06T17:40:57.000Z","dependencies_parsed_at":null,"dependency_job_id":"e0d404c4-fce5-42f8-86dc-4ea11f511176","html_url":"https://github.com/onekq/WebApp1k","commit_stats":null,"previous_names":["onekq/webapp1k"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/onekq%2FWebApp1k","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/onekq%2FWebApp1k/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/onekq%2FWebApp1k/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/onekq%2FWebApp1k/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/onekq","download_url":"https://codeload.github.com/onekq/WebApp1k/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243558470,"owners_count":20310573,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-14T10:01:20.512Z","updated_at":"2025-03-14T10:01:22.002Z","avatar_url":"https://github.com/onekq.png","language":"JavaScript","funding_links":[],"categories":["JavaScript"],"sub_categories":[],"readme":"[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/insights-from-benchmarking-frontier-language/code-generation-on-webapp1k-react)](https://paperswithcode.com/sota/code-generation-on-webapp1k-react?p=insights-from-benchmarking-frontier-language)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-case-study-of-web-app-coding-with-openai/code-generation-on-webapp1k-duo-react)](https://paperswithcode.com/sota/code-generation-on-webapp1k-duo-react?p=a-case-study-of-web-app-coding-with-openai)\n# WebApp1K Benchmark\n\nWebApp1K is a coding benchmark aiming to evaluate LLMs on their abilities to develop real-world web applications.\n\n## Getting Started\nWebApp1K is designed to run on any environment as long as it has Python and Node.js.\n\n### System Requirements\nEnsure your environmnt has the following installed:\n- Python (version 3.8 or higher)\n- Node.js (version 14 or higher)\n\n### Setup\n1. Clone the GitHub repository:\n   ```bash\n   git clone https://github.com/onekq/WebApp1K.git\n   cd WebApp1K\n   ```\n2. Optional: Create virtual environment:\n   ```bash\n   python -m venv venv\n   source venv/bin/activate  # Linux\n   venv\\Scripts\\activate # Windows\n   ```\n3. Install Python dependencies:\n   ```bash\n   pip install -r requirements.txt\n   ```\n4. Install Node.js dependencies:\n   ```bash\n   cd staging\n   npm install\n   cd ..\n   ```\n## For LLM Researchers and Benchmark Users\nIf you desire deeper insights into the coding performance of a leading LLM available via public token-based API, you can run this benchmark to obtain all raw results, e.g. generated code and error logs.\n\n### Get your API keys:\n| Vendor    | API key name      | Code Generator | Models                       |\n|-----------|-------------------|----------------|------------------------------|\n| OpenAI    | OPENAI_API_KEY    | gpt            | GPT-4o, GPT-4o-mini, etc     |\n| Anthropic | ANTHROPIC_API_KEY | claude         | Claude-3.5-Sonnet, etc.      |\n| Google    | GEMINI_API_KEY    | gemini         | Gemini-1.5-pro, etc.         |\n| Mistral   | MISTRAL_API_KEY   | mistral        | mixtral-8x22b-instruct, etc. |\n| Fireworks | FIREWORKS_API_KEY | fireworks      | open source models           |\n\nThe above table shows tbe latest popular LLMs and their hosting services. Depending on which LLMs interest you, you need to purchase the corresponding API keys and save them as your environment variables.\n\nUsing OpenAI as an example:\n```bash\nexport OPENAI_API_KEY='your-openai-api-key' # For Linux/MacOS\nset OPENAI_API_KEY=your-openai-api-key      # For Windows\n```\n### Run the benchmark\nMake sure you have followed all steps outlined in the [Setup section](#setup), also you are in the `WebApp1K` directory. Then run the following script.\n   ```bash\n   python run_eval.py -g \u003ccode_generator\u003e -m \u003cmodel\u003e\n   ```\nThe code generator can be found in the above table. Using OpenAI as an example, if you want to evaluate the GPT-4o model, the script should be as follows.\n   ```bash\n   python run_eval.py -g gpt -m gpt-4o\n   ```\nAfter the evaluation is finished, all the generated code can be found under a new `models` subdirectory. If a Javascript file has the suffix `success.js`, this means the code passes the tests. If the file has the suffix `failure.js`, this means the code fails the tests. In addition, a companion `failure.log` file of the same name logs the test errors.\n\n\u003e **Note:** This script **takes hours to finish**. It is highly likely to crash due to various service-related exceptions\nsuch as rate limiting, server overload, unsafe content. We purposely let it crash and let you decide what to do. A common scenario is that you reach the daily token limit to run the model inference, and the simplest solution is wait until the next day. Resuming the run is easy: simply run the same script again. By examining files in the `staging` and `models` folders, it is able to locate the breakpoint, then continue the evaluation from there.\n\n## For LLM Trainers\nIf you pretrain or finetune your own LLM, and like to use this benchmark to evaluate your checkpoints, you should still follow the above process, but make the following changes.\n\n1. **You won't need any API keys**, and majority of the code in `generate_code.py` won't be relevant to you.\n\n2. **Modify the `generate_implementation` method** to run inference against your own checkpoint.\n\n## Contact Us\nFor questions, inquiries, suggestions, please reach out to info@onekq.ai.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fonekq%2FWebApp1k","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fonekq%2FWebApp1k","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fonekq%2FWebApp1k/lists"}