{"id":48422514,"url":"https://github.com/microsoft/testexplora","last_synced_at":"2026-04-06T09:01:27.264Z","repository":{"id":342446504,"uuid":"1103128802","full_name":"microsoft/TestExplora","owner":"microsoft","description":"This is an official code for the paper: TestExplora: Benchmarking LLMs for Proactive Bug Discovery via Repository-Level Test Generation","archived":false,"fork":false,"pushed_at":"2026-03-26T14:44:46.000Z","size":72,"stargazers_count":9,"open_issues_count":2,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-06T07:50:55.666Z","etag":null,"topics":["codellm"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/microsoft.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-11-24T13:17:14.000Z","updated_at":"2026-03-20T02:23:20.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/microsoft/TestExplora","commit_stats":null,"previous_names":["microsoft/testexplora"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/microsoft/TestExplora","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FTestExplora","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FTestExplora/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FTestExplora/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FTestExplora/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/microsoft","download_url":"https://codeload.github.com/microsoft/TestExplora/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/microsoft%2FTestExplora/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31466228,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-06T08:36:52.050Z","status":"ssl_error","status_checked_at":"2026-04-06T08:36:51.267Z","response_time":112,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["codellm"],"created_at":"2026-04-06T09:00:45.164Z","updated_at":"2026-04-06T09:01:27.248Z","avatar_url":"https://github.com/microsoft.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# TestExplora\n\nThis repository is the official implementation of the paper \"TestExplora: Benchmarking LLMs for Proactive Bug Discovery via Repository-Level Test Generation\" It can be used for baseline evaluation using the prompts mentioned in the paper.\n\n## Table of Contents\n\n- [What is TestExplora](#what-is-testexplora)\n- [Setup](#setup)\n- [How to Deploy TestExplora](#how-to-deploy-testexplora)\n  - [Test Generation (Inference)](#test-generation-inference)\n  - [Supported Models](#supported-models)\n- [Build Benchmark](#build-benchmark)\n- [Contributing](#contributing)\n- [Trademarks](#trademarks)\n\n## What is TestExplora\n\nTestExplora is a systematic, repository-level benchmark designed to evaluate the capability of Large Language Models to proactively discover latent software defects by generating tests. It was developed to evaluate the proactive defect discovery capabilities of LLMs at the repository level. \n\nOur dataset is constructed from real-world GitHub pull requests, containing 2,389 test-generation tasks sourced from 1,552 PRs across 482 repositories. Each task is designed such that the model must write test cases capable of triggering a Fail-to-Pass transition between buggy and repaired versions – reflecting true defect detection rather than passive confirmation. The benchmark further includes automatically generated documentation for test entry points to enable scalable evaluation.\n\n## Setup\n\n### Prerequisites\n\n- Python 3.10+\n- Docker (for local test evaluation)\n- Git\n\n### Installation\n\n```bash\ngit clone https://github.com/microsoft/TestExplora.git\ncd TestExplora\n```\n\nInstall core dependencies:\n\n```bash\npip install -r requirements.txt\n```\n\n## How to Deploy TestExplora\n\n### Test Generation (Inference)\n\nThe main entry point is `testexplora/harness/inference.py`. Given the benchmark dataset (JSON format), it drives the target LLM to generate test cases for each task and saves the results as test patches.\n\n```bash\npython testexplora/harness/inference.py \\\n  --data_path \u003cpath_to_data.json\u003e \\\n  --repo_testbed_dir \u003coutput_directory\u003e \\\n  --model \u003cmodel_name\u003e \\\n  --test_type \u003cwhitebox|graybox|blackbox\u003e \\\n```\n\n#### Output\n\n- `test_patches.json` — Generated test patches per repository and PR.\n- `config.yaml` — Experiment configuration for reproducibility.\n- `generation.log` — Detailed execution log.\n- `trajectory/` — Agent trajectory files (for agent-based models).\n\n### Supported Models\n\nThe benchmark supports evaluation across a broad set of LLMs and coding agents. To reproduce or customize results for a specific model, modify the corresponding call file under `testexplora/harness/call_pipeline/`.\n\n**API-based Models (Direct LLM Call)**\n\n| Model Key | Call File |\n|---|---|\n| `gpt-4o`, `o3-mini`, `o4-mini`, `gpt-5-mini`, `gpt-5`, `r1` | `call_gpt.py` |\n| `claude_sonnet` | `call_gpt.py` (Anthropic via Azure) |\n| `gemini-2.5-pro`, `gemini-2.5-flash` | `call_gemini.py` |\n| `Codellama-34B`, `Qwen3-Coder-30B` | `call_vllm.py` |\n\n**Agent-based Models (Agentic Code Exploration)**\n\n| Model Key | Call File |\n|---|---|\n| `sweagent-*` | `call_sweagent.py` |\n| `traeagent-*` | `call_traeagent.py` |\n\n\u003e **Note:** Agent-based models only support `whitebox` test type.\n\n## Build Benchmark\n\nTo construct a benchmark dataset similar to TestExplora from your own set of GitHub repositories, use `testexplora/build_benchmark/process_data.py`. It automates the end-to-end pipeline:\n\n1. **Clone repositories** and iterate over closed pull requests.\n2. **Checkout the base commit** (pre-PR state) and extract code structure \u0026 dependency graphs.\n3. **Apply the PR patch**, then re-extract code structure to obtain the post-PR state.\n4. **Identify changed functions/methods** by mapping diff line ranges to AST-level code elements.\n\n```bash\npython testexplora/build_benchmark/process_data.py\n```\n\n\u003e Before running, update the paths at the bottom of `process_data.py` to point to your repository data JSON directory and a local directory for cloning repos.\n\nThe script relies on two helper modules under the same directory:\n\n- **`parse_repo.py`** — AST-based extraction of classes, functions, methods, and their metadata from a Python repository.\n- **`build_dependency_graph.py`** — Builds inter-function dependency graphs using NetworkX, including cross-file import resolution.\n\n## Contributing\n\nThis project welcomes contributions and suggestions.  Most contributions require you to agree to a\nContributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us\nthe rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.\n\nWhen you submit a pull request, a CLA bot will automatically determine whether you need to provide\na CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions\nprovided by the bot. You will only need to do this once across all repos using our CLA.\n\nThis project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).\nFor more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or\ncontact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.\n\n## Trademarks\n\nThis project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft \ntrademarks or logos is subject to and must follow \n[Microsoft's Trademark \u0026 Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).\nUse of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.\nAny use of third-party trademarks or logos are subject to those third-party's policies.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2Ftestexplora","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmicrosoft%2Ftestexplora","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmicrosoft%2Ftestexplora/lists"}