{"id":29704302,"url":"https://github.com/eth-sri/insec","last_synced_at":"2025-07-23T14:10:04.748Z","repository":{"id":299380377,"uuid":"992482122","full_name":"eth-sri/insec","owner":"eth-sri","description":"Reproduction Package for \"Black-Box Adversarial Attacks on LLM-Based Code Completion\" [ICML 2025]","archived":false,"fork":false,"pushed_at":"2025-06-16T08:22:38.000Z","size":1950,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-07-21T22:10:46.783Z","etag":null,"topics":["adverserial-attack","code-completion","llm","security"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/eth-sri.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-29T08:18:44.000Z","updated_at":"2025-07-14T22:19:06.000Z","dependencies_parsed_at":"2025-06-16T09:36:00.332Z","dependency_job_id":null,"html_url":"https://github.com/eth-sri/insec","commit_stats":null,"previous_names":["eth-sri/insec"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/eth-sri/insec","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eth-sri%2Finsec","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eth-sri%2Finsec/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eth-sri%2Finsec/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eth-sri%2Finsec/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/eth-sri","download_url":"https://codeload.github.com/eth-sri/insec/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eth-sri%2Finsec/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266691580,"owners_count":23969182,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-23T02:00:09.312Z","response_time":66,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["adverserial-attack","code-completion","llm","security"],"created_at":"2025-07-23T14:10:03.981Z","updated_at":"2025-07-23T14:10:04.719Z","avatar_url":"https://github.com/eth-sri.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Black-Box Adversarial Attacks on LLM-Based Code Completion\n[![arXiv](https://img.shields.io/badge/arXiv-2408.02509-b31b1b.svg)](https://arxiv.org/abs/2408.02509)\n[![Hugging Face](https://img.shields.io/badge/Hugging%20Face-dataset-FF9D00?logo=huggingface\u0026logoColor=white)](https://huggingface.co/datasets/eth-sri/insec-vulnerability/)\n\nThis is the reproduction package for our INSEC attack (**IN**jecting **S**ecurity-**E**vading **C**omments), presented in the paper \"Black-Box Adversarial Attacks on LLM-Based Code Completion\" by Jenko, Mündler, et. al., *ICML 2025*.\nIt includes descriptions on how to install the required dependencies, how to run the code, and how to reproduce the results from the paper.\n\n## Installation\n\nWe provide extensive installation instructions in the [INSTALL.md](INSTALL.md) file.\n\n## Running the code\n\nBelow is an example of how to get the attack strings on [StarCoder 3B](https://huggingface.co/bigcode/starcoderbase-3b).\n\n```\ncd scripts\npython3 generic_launch.py --config fig3_main/main_scb3/config.json --save_dir ../results/example\n```\n\nThe naming convention is `\u003csave-dir\u003e/\u003clistparam\u003e/\u003ctimestamp\u003e/\u003celem\u003e`, where \n- `save_dir` is the save-dir parameter passed to `generic_launch.py`\n- `listparam` is the exactly one parameter that is stored as a list \n- `timestamp` is the timestamp parameter in the config file\n- `elem` is one of the elements of `listparam`:\n\nIn this case, the results are stored in `data/example/model_dir/final/starcoderbase-3b/starcoderbase-3b/`.\n\n### Reproducing Figures\n\nWe provide the configurations used to generate data for each figure in `scripts/fig*`. They can be run as described above.\n\n## Dataset\n\n\u003e Note: You can find the vulnerability dataset on [Hugging Face](https://huggingface.co/datasets/eth-sri/insec-vulnerability/)\n\nYou can find the training, validation and test sets for the vulnerability dataset in the folders [`data_train_val`](data_train_val) and [`data_test`](data_test) respectively. Each directory contains subdirectories for the respective CWEs. The CWE directories contain JSONL lists of objects (`train.jsonl`, `val.jsonl`, and `test.jsonl`)  with the following attributes:\n\n- `pre_tt`: Text preceding the line of the vulnerability\n- `post_tt`: Text preceding the vulnerable tokens in the line of the vulnerability\n- `suffix_pre`: Text following the vulnerable tokens in the line of the vulnerability\n- `suffix_post`: Remainder of the file after the line of the vulnerability\n- `lang`: Language of the vulnerable code snippet (e.g., `py` or `cpp`)\n- `key`: Key character sequences that were used to substitute CodeQL queries during training. Only in the train split.\n- `info`: A metadata object, containing the CodeQL query to check the snippet for vulnerabilities and the source of the code snippet.\n\n\nIn particular, the prefix for model infilling is `pre_tt + post_tt`, whereas the suffix is `suffix_pre + suffix_post`.\n\nFor the functionality datasets, please find the corresponding data in the subfolders of [`multipl-e`](multipl-e), including the functionality dataset for the main evaluation based on Multipl-E, [`multiple_fim`](multipl-e/multiple_fim), our confirmation dataset based on HumanEval-X, [`humaneval-x_fim`](multipl-e/humaneval-x_fim), and our repository-level completion dataset based on RepoBench, [`repobench_fim`](multipl-e/repobench_fim).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feth-sri%2Finsec","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feth-sri%2Finsec","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feth-sri%2Finsec/lists"}