{"id":24935526,"url":"https://github.com/engineeringsoftware/exlong","last_synced_at":"2025-04-09T23:41:15.526Z","repository":{"id":274537076,"uuid":"897538971","full_name":"EngineeringSoftware/exLong","owner":"EngineeringSoftware","description":"exLong: Generating Exceptional Behavior Tests with Large Language Models","archived":false,"fork":false,"pushed_at":"2025-02-15T22:19:43.000Z","size":119,"stargazers_count":7,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-24T01:35:17.027Z","etag":null,"topics":["exceptions","exlong","generation","ml","nlp","testing"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/EngineeringSoftware.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-02T20:00:50.000Z","updated_at":"2025-02-15T22:19:46.000Z","dependencies_parsed_at":"2025-01-27T22:34:45.548Z","dependency_job_id":"a80be435-2140-42a1-9549-640f0dab342c","html_url":"https://github.com/EngineeringSoftware/exLong","commit_stats":null,"previous_names":["engineeringsoftware/exlong"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EngineeringSoftware%2FexLong","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EngineeringSoftware%2FexLong/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EngineeringSoftware%2FexLong/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EngineeringSoftware%2FexLong/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/EngineeringSoftware","download_url":"https://codeload.github.com/EngineeringSoftware/exLong/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248131467,"owners_count":21052819,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["exceptions","exlong","generation","ml","nlp","testing"],"created_at":"2025-02-02T15:38:42.621Z","updated_at":"2025-04-09T23:41:15.515Z","avatar_url":"https://github.com/EngineeringSoftware.png","language":"Python","readme":"# 🐲🔨 exLong: Generating Exceptional Behavior Tests with Large Language Models \nexLong is a large language model instruction-tuned from CodeLlama and embeds reasoning about \n- **traces** that lead to throw statements\n- **conditional expressions** that guard throw statements\n- **non-exceptional behavior tests** that execute similar traces\n\n# About\nThis repo hosts the code and data for the following ICSE 2025 paper:\n\nTitle: [exLong: Generating Exceptional Behavior Tests with Large Language Models](https://arxiv.org/abs/2405.14619)\n\nAuthors: [Jiyang Zhang](https://jiyangzhang.github.io/), [Yu Liu](https://sweetstreet.github.io/), [Pengyu Nie](https://pengyunie.github.io/), [Junyi Jessy Li](https://jessyli.com/), [Milos Gligoric](http://users.ece.utexas.edu/~gligoric/)\n\n\n\n```bibtex\n@inproceedings{ZhangETAL25exLong,\n  author = {Zhang, Jiyang and Liu, Yu and Nie, Pengyu and Li, Junyi Jessy and Gligoric, Milos},\n  title = {exLong: Generating Exceptional Behavior Tests with Large Language Models},\n  booktitle = {International Conference on Software Engineering},\n  year = {2025},\n}\n```\n\n# Table of Contents\n1. [Quick Start][sec-hf] 🤗\n2. [Set Up][sec-setup] :rocket:\n3. [Experiments][sec-exp] :construction_worker:\n4. [Artifacts][sec-artifacts] :star:\n\n\n# Quick Start\n \n[sec-hf]: #quick-start\n\n- The exLong dataset is on Hugging Face [🤗](https://huggingface.co/datasets/EngineeringSoftware/exLong-dataset)!\n```bash\nfrom datasets import load_dataset\n\nwith_name_ds = load_dataset(\"EngineeringSoftware/exLong-dataset\", \"with-EBT-name\")\nno_name_ds = load_dataset(\"EngineeringSoftware/exLong-dataset\", \"no-EBT-name\")\n```\n\n- The exLong model is on Hugging Face [🤗](https://huggingface.co/EngineeringSoftware/exLong)!\n\n```bash\npip install transformers accelerate bitsandbytes peft\n```\n\n```python\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\nfrom peft import PeftModel, PeftConfig\n\n# Load the base model\nbase_model_name = \"codellama/CodeLlama-7b-Instruct-hf\"\nbase_model = AutoModelForCausalLM.from_pretrained(base_model_name)\n\n# Load the LoRA configuration\npeft_model_id = \"EngineeringSoftware/exLong\"\nconfig = PeftConfig.from_pretrained(peft_model_id, revision=\"with-etest-name\")  # set revision to \"no-etest-name\" for no EBT name\n\n# Load the LoRA model\nmodel = PeftModel.from_pretrained(base_model, peft_model_id)\ntokenizer = AutoTokenizer.from_pretrained(base_model_name)\n\nprompt = \"\"\"\u003cs\u003e[INST] \u003c\u003cSYS\u003e\u003e\nYou are a helpful programming assistant and an expert Java programmer. You are helping a user writing exceptional-behavior tests for their Java code.\n\u003c\u003c/SYS\u003e\u003e\n\nPlease complete an exceptional behavior test method in Java to test the method 'factorial' for the exception 'IllegalArgumentException'.\nThe method to be tested is defined as:\n```java\npublic static long factorial(int n) {\n    if (n \u003c 0) {\n        throw new IllegalArgumentException(\"Number must be non-negative.\");\n    }\n    long result = 1;\n    for (int i = 1; i \u003c= n; i++) {\n        result *= i;\n    }\n    return result;\n}\n` ` `\nPlease only give the new exceptional-behavior test method to complete the following test class. Do NOT use extra libraries or define new helper methods. Return **only** the code in the completion:\n```java\npublic class FactorialTest {\n}\n` ` `\n\"\"\"\n\ninput_ids = tokenizer(prompt, return_tensors=\"pt\").input_ids\n\n# Generate code\noutput = model.generate(\n    input_ids=input_ids,\n    max_new_tokens=100,\n    temperature=0.2,      # Sampling temperature (lower is more deterministic)\n    top_p=0.95,           # Top-p (nucleus) sampling\n    do_sample=True        # Enable sampling\n)\n\n# Decode and print the generated code\ngenerated_code = tokenizer.decode(output[0], skip_special_tokens=True)\nprint(\"Generated Code:\")\nprint(generated_code)\n```\n\n# Set Up\n[sec-setup]: #set-up\n\n## Dependencies Set Up\n[sec-dep]: #dependencies-set-up\n\n1. Create conda environment\n```bash\nconda create -n exlong python=3.9\nconda activate exlong\npip install -r requirements.txt\n```\n\n2. We used [axolotl](https://github.com/axolotl-ai-cloud/axolotl) to fine-tune the CodeLlama model. If you want to train your own model, install the extra dependencies\n```bash\n# we used an older version of axolotl to train the models\ngit clone git@github.com:JiyangZhang/axolotl-exlong.git\ncd axolotl-exlong/\nconda activate exlong\npip install packaging\n# set CUDA_HOME\nexport CUDA_HOME=/opt/apps/cuda/12.0/\npip3 install -e '.[flash-attn,deepspeed]'\n```\n## Experiments Set Up\n[sec-setupexp]: #experiments-set-up\n\n1. Download raw dataset\n```bash\nmkdir -p _work/data/\nmkdir -p _work/exp/\nmkdir -p _work/setup/\n\nwget -L https://utexas.box.com/shared/static/hfcp4za3j9vp8lh5u8iviadixuxu8080.gz -O raw-data.tar.gz\ntar -xzf raw-data.tar.gz -C _work/data/\nmv _work/data/etestgen-raw-data-12k _work/data/ne2e\n\nwget -L https://utexas.box.com/shared/static/4m7mntp0ix18dkl1ikkspcmpuvybfs1f.gz -O ne2e-test.tar.gz\ntar -xzf ne2e-test.tar.gz -C _work/data/\n\nwget -L https://utexas.box.com/shared/static/y4e52k5x8vk8vcr59lg33gebcg2m1caw.gz -O rq2.tar.gz\ntar -xzf rq2.tar.gz -C _work/data/\n\n# netest-diversity\nwget -L https://utexas.box.com/shared/static/j417e93j1rdvdqz2yobttygfhucfbkjm.gz -O netest-diversity.tar.gz\ntar -xzf netest-diversity.tar.gz -C _work/data/\n\n```\nYou should see  `_work/data/ne2e`, `_work/data/rq1-eval`, `_work/data/rq2` and `_work/data/netest-diversity`.\n\n2. Prepare dataset and put them in the `_work/setup` directory\n-  exLong \u0026\u0026 exlong sample (Table IV \u0026 V)\n```bash\n# exlong\ninv -e data.setup-model-data --setup-name conditionnestack2e-with-name-ft\ninv -e data.setup-model-data --setup-name conditionnestack2e-no-name-ft\n\n# exlong sample\ninv -e data.setup-model-data --setup-name conditionnestack2e-all-with-name-ft\ninv -e data.setup-model-data --setup-name conditionnestack2e-all-no-name-ft\n```\nYou should see `_work/setup/conditionnestack2e-with-name-ft/`, `_work/setup/conditionnestack2e-no-name-ft/`, `_work/setup/conditionnestack2e-all-with-name-ft/`, `_work/setup/conditionnestack2e-all-no-name-ft/` directories.\n\n\n3. Construct prompts for exLong developer-view\n- exLong\n```bash\ninv -e data.process-codellama-data --setup-name conditionnestack2e-with-name-ft\ninv -e data.process-codellama-data --setup-name conditionnestack2e-no-name-ft\n```\n\n4. Construct prompts for exLong machine-view\n```bash\nmkdir _work/setup/conditionnestack2e-all-no-name-ft/eval/ -p\ncp -r _work/data/rq2/ _work/setup/conditionnestack2e-all-no-name-ft/eval/\npython -m etestgen.codellama.realDataProcessor --config_file configs/eval-codellama-7b-machine-view-conditionnestack2e-all-no-name.yaml process_test_data\n```\nYou will see `_work/setup/conditionnestack2e-all-no-name-ft/eval/rq2/test-conditionnestack2e-all-no-name-ft.jsonl`.\n\n# Experiments\n\n[sec-exp]: #experiments\n\n## Training\n\n1. Training exLong w. EBT name\n\n**Note**: conditionnestack2e is the setup name for exLong \n```bash\ncd python/\naccelerate launch -m axolotl.cli.train configs/axolotl/axolotl-conditionnestack2e-with-name-7b.yaml\n```\nYou will see checkpoints in directory `_work/exp/conditionnestack2e-with-name-ft/lora-codellama-7b/`\n\n2. Training exLong w.o. EBT name\n```bash\ncd python/\naccelerate launch -m axolotl.cli.train configs/axolotl/axolotl-conditionnestack2e-no-name-7b.yaml\n# script to run on TACC\nsbatch axolotl-lora-codellama-7b-conditionnestack2e-no-name.sh\n```\nYou will see checkpoints in directory `_work/exp/conditionnestack2e-no-name-ft/lora-codellama-7b/`\n\n3. Running inference exLong for developer-view\n```bash\ncd python/\n# Run evaluation on the selected 434 examples in the test set\npython -m etestgen.codellama.CodeLLaMA --config_file configs/codellama-7b-conditionnestack2e-with-name-ft.yaml run_gen --split real-test\n```\nYou will see checkpoints, model outputs in directory `_work/exp/conditionnestack2e-with-name-ft/lora-codellama-7b/real-test-set-model-outputs.jsonl`\n\n4. Running inference exLong for machine-view\n```bash\ncd python/\npython -m etestgen.codellama.CodeLLaMA --config_file configs/eval-codellama-7b-machine-view-conditionnestack2e-all-no-name.yaml run_gen\n# Evaluation1: all covered projects\npython -m etestgen.llm.eval --config_file configs/eval-codellama-7b-machine-view-conditionnestack2e-all-no-name.yaml eval_runtime_metrics\n# You will see eval results in `results/model-results/conditionnestack2e-all-no-name-ft-lora-codellama-7b-eval-rq2-runtime-metrics.json`\n# Evaluation2: intersection projects\npython -m etestgen.llm.eval --eval_set rq2 --config_file configs/eval-codellama-7b-machine-view-conditionnestack2e-all-no-name.yaml eval_subset_llm_results --subset_id_file ../results/tool-results/intersect-ids.json\n# You will see eval results in `results/model-results/conditionnestack2e-all-no-name-ft-lora-codellama-7b-eval-rq2-intersect-runtime-metrics.json`\n```\nYou will see model generations in directory `_work/exp/conditionnestack2e-all-no-name-ft/lora-codellama-7b/rq2-model-outputs.jsonl`\n\n\n## Evaluation: compute metrics\n\nGiven test cases generated by exLong, this step will evaluate them with metrics like BLEU, CodeBLEU, Test Coverage, etc.\n\n- Input/Output info\n    - Dataset used to evaluation is expected at `_work/{test_data}`\n    - Processed LLM prediction is expected at `_work/exp/{setup}/{model_name}/test-results`\n    - Similarity metrics result will be written to `_work/exp/{setup}/{model_name}/test-out/similarity_metrics_summary.json` and `results/model-results/{setup}-{exp}-{eval_set}-sim-metrics.json`\n    - Runtime Metrics will be written to `results/model-results/{setup}-{exp}-{eval_set}-runtime-metrics.json` and individual result will be at `_work/exp/{setup}/{model_name}/test-results/metrics.jsonl`\n\n\n- To run evaluation on similarity metrics\n  - Run on an individual experiment\n\n    ```bash\n    python -m etestgen.llm.eval --eval_set test --config_file [/path/to/config/file] eval_llm_sim\n    ```\n\n- To run evaluation on runtime metrics\n  - Run on an individual experiment\n\n    ```bash\n    python -m etestgen.llm.eval --eval_set test --config_file [/path/to/config/file] eval_runtime_metrics\n    ```\n\n## Ablations on exLong's context\n#### Diversity of the nEBTs\n1. Prepare Dataset\n```bash\nmkdir -p _work/setup/diversity-conditionnestack2e-sample-with-name-ft/real-eval/test/\nmkdir -p _work/setup/diversity-conditionnestack2e-all-with-name-ft/real-eval/test/\ncp -r _work/data/netest-diversity/* _work/setup/diversity-conditionnestack2e-sample-with-name-ft/real-eval/test/\ncp -r _work/data/netest-diversity/* _work/setup/diversity-conditionnestack2e-all-with-name-ft/real-eval/test/\ncd python/\npython -m etestgen.codellama.DataProcessor --config_file configs/codellama-7b-diversity-conditionnestack2e-sample-with-name-ft.yaml process_real_test_data\npython -m etestgen.codellama.DataProcessor --config_file configs/codellama-7b-diversity-conditionnestack2e-all-with-name-ft.yaml process_real_test_data\n```\nYou will see processed data in `_work/setup/diversity-conditionnestack2e-all-with-name-ft/real-eval/test/` and `_work/setup/diversity-conditionnestack2e-sample-with-name-ft/real-eval/test/`.\n\n2. Running Inference\n```bash\n# [2nd row] use the same exLong ckpt but try prompting with the same nEBT multiple times\npython -m etestgen.codellama.CodeLLaMA --config_file configs/codellama-7b-diversity-conditionnestack2e-sample-with-name-ft.yaml  run_gen --split real-test --target_ckpt ../_work/exp/conditionnestack2e-with-name-ft/lora-codellama-7b/\n# [3rd row] use the same exLong ckpt but try prompting with different nEBTs\npython -m etestgen.codellama.CodeLLaMA --config_file configs/codellama-7b-diversity-conditionnestack2e-all-with-name-ft.yaml  run_gen --split real-test --target_ckpt ../_work/exp/conditionnestack2e-with-name-ft/lora-codellama-7b/\n```\nYou will see model outputs in directory `_work/exp/diversity-conditionnestack2e-sample-with-name-ft/lora-codellama-7b/` and `_work/exp/diversity-conditionnestack2e-all-with-name-ft/lora-codellama-7b/`\n\n#### exLong w.o. stack trace\n\n1. Prepare Dataset\n```bash\ninv -e data.setup-model-data --setup-name conditionne2e-with-name-ft\ninv -e data.process-codellama-data --setup-name conditionne2e-with-name-ft\n```\n2. Running Inference\n```bash\npython -m etestgen.codellama.CodeLLaMA --config_file python/configs/eval/codellama-7b-conditionne2e-with-name-ft.yaml run_gen\n```\n\n#### exLong w.o. stack trace \u0026 guard expression\n\n1. Prepare Dataset\n```bash\ninv -e data.setup-model-data --setup-name ne2e-with-name-ft\ninv -e data.process-codellama-data --setup-name ne2e-with-name-ft\n```\n\n2. Running Inference\n```bash\npython -m etestgen.codellama.CodeLLaMA --config_file configs/eval/codellama-7b-ne2e-with-name-ft.yaml run_gen\n```\n\n#### exLong w.o. stack trace \u0026 guard expression \u0026 nEBT\n\n1. Prepare Dataset\n```bash\ninv -e data.setup-model-data --setup-name mut2e-with-name-ft\ninv -e data.process-codellama-data --setup-name mut2e-with-name-ft\n```\n\n2. Running Inference\n```bash\npython -m etestgen.codellama.CodeLLaMA --config_file configs/eval/codellama-7b-mut2e-with-name-ft.yaml run_gen\n```\n\n\n# Artifacts:\n\n[sec-artifacts]: #artifacts\n\n### Model Checkpoints:\n- [exLong-with-name (7B and 13B)](https://utexas.box.com/s/u20ya44oq8eon8aaot479iynpa90erog): exLong models in Table IV, Table VI and Table VIII.\n- [exLong-no-name (7B)](https://utexas.box.com/s/9oo0fcbnhi8b6tggb273otjt5bzw8u0j): exLong models in Table V.\n- [exLong-with-name w.o. stack trace (7B)](https://utexas.box.com/s/qikt46jxnf3g3pvqmznruf9bjd8yi17q): exLong no stack trace model in Table VI.\n- [exLong-with-name w.o. stack trace \u0026 guard expr (7B)](https://utexas.box.com/shared/static/mxls9c8580igtbnbt2kw29lxl1nhdsv7.tar): exLong no stack trace \u0026 no guard expr model in Table VI.\n- [exLong-with-name w.o. stack trace \u0026 guard expr \u0026 EBT (7B)](https://utexas.box.com/s/p7bcffw0vxelkrp5a2d70anzkekat2rv): exLong no stack trace \u0026 no guard expr \u0026 no EBT model in Table VI.\n- [exLong-with-name w.o. stack trace \u0026 guard expr \u0026 EBT (13B)](https://utexas.box.com/s/uaunxdgzql5m6qqt8113ks288xmk0gh1): exLong 13B no stack trace \u0026 no guard expr \u0026 no EBT model in Table VIII.\n\n### Dataset:\n- [repos.tar.gz](https://utexas.box.com/s/5f9ogvbe3nnz2fijplu1zs2ohygmgsv2): The repository list from which we collected the dataset.\n- [raw-data.tar.gz](https://utexas.box.com/shared/static/hfcp4za3j9vp8lh5u8iviadixuxu8080.gz): The raw collected data from the open-source repositories. `etestgen-raw-data-12k/`\n- [ne2e-test.tar.gz](https://utexas.box.com/shared/static/4m7mntp0ix18dkl1ikkspcmpuvybfs1f.gz): The collected dataset for eval in **developer-view**. `rq1-eval/`\n- [machine-view.tar.gz](https://utexas.box.com/shared/static/y4e52k5x8vk8vcr59lg33gebcg2m1caw.gz): The collected dataset for eval in **machine-view**. `rq2/`\n- [netest-diversity.tar.gz](https://utexas.box.com/shared/static/j417e93j1rdvdqz2yobttygfhucfbkjm.gz): The collected dataset we use to study how the different nEBTs affect model's performance (Table VII). `netest-diversity/`\n- [processed dataset](https://utexas.box.com/s/dwxneqvx1m1zw2t68tcugxmk1c7gitjm): The processed dataset (prompts) to train the exLong models.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fengineeringsoftware%2Fexlong","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fengineeringsoftware%2Fexlong","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fengineeringsoftware%2Fexlong/lists"}