{"id":20880240,"url":"https://github.com/betarixm/csed499ii","last_synced_at":"2026-05-26T09:31:05.521Z","repository":{"id":215069471,"uuid":"738015515","full_name":"betarixm/CSED499II","owner":"betarixm","description":"POSTECH: Research Project II (Fall 2023)","archived":false,"fork":false,"pushed_at":"2024-01-02T08:28:47.000Z","size":4231,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-12-26T20:57:39.224Z","etag":null,"topics":["codellama","huggingface","llm","poetry","postech","python","torch","vulnerability"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/betarixm.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2024-01-02T08:01:51.000Z","updated_at":"2024-01-02T08:33:05.000Z","dependencies_parsed_at":"2024-01-02T10:58:54.526Z","dependency_job_id":"db8346bf-ec7c-4235-a41a-7e9b3e24af2b","html_url":"https://github.com/betarixm/CSED499II","commit_stats":null,"previous_names":["betarixm/csed499ii"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/betarixm/CSED499II","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/betarixm%2FCSED499II","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/betarixm%2FCSED499II/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/betarixm%2FCSED499II/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/betarixm%2FCSED499II/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/betarixm","download_url":"https://codeload.github.com/betarixm/CSED499II/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/betarixm%2FCSED499II/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33513839,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T03:12:49.672Z","status":"ssl_error","status_checked_at":"2026-05-26T03:12:47.976Z","response_time":63,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["codellama","huggingface","llm","poetry","postech","python","torch","vulnerability"],"created_at":"2024-11-18T07:19:35.810Z","updated_at":"2026-05-26T09:31:05.501Z","avatar_url":"https://github.com/betarixm.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Learning to Detect Vulnerable Code Using Differentiable Line Probability of Large Code Models\n\n## Motivation\n\n### Problem Description\n\n- Mission: Automate vulnerability detection to reduce manual effort.\n- Objectives:\n  - Automation: Reduce manual efforts in detection.\n  - Efficiency: Quickly identify vulnerabilities in code.\n  - Accuracy: Minimize false positives and false negatives.\n  - Scalability. Easily integrated with multiple languages.\n\n## Method\n\n### Line Probability\n\n- Objectives:\n  - Must quantify vulnerability likelihood at the line level.\n  - Must be differentiable for accurate training.\n- *Line probability* is the sum of token probabilities which indicates the vulnerability of a code line.\n- *Token probability* is derived from a distribution of large code model outputs, where the input is a convex combination of prior distributions and some previous code lines.\n\n### Prompt Tuning\n\n- Objectives:\n  - Must optimize the performance of line probability.\n  - Must be tuned in terms of line probabilities.\n- *Prompt tuning* is a mechanism for learning soft prompts, enabling models to perform specific down- stream tasks.[1]\n- *Large Code Model*, the tuned version of large language models, detects the possibility of vulnerability by line probabilities.\n\n## Results\n\n### Model and Dataset\n\n- Base Model: CodeLlama 7B\n- Vulnerable Lines: 336\n- Benign Lines: 3,207\n- Validation Split: 0.2\n\n### Parameters\n\n- Length of Soft Prompt: 64\n- Epochs: 8\n- Batch Size: 16\n- Learning Rate: 0.0005\n\n### Observations\n\n- *Line probability* emergence as a potent metric for vulnerability detection.\n- *Prompt tuning* makes the model more sensitive to vulnerable lines.\n\n### Discussion\n\n1. High Cost of Calculating Line Probabilities.\n   - Convex-combinated token embedding vector is constructed from all-available tokens of a base model.\n   - A line probability should be calculated sequentially. These two reasons make the calculation very slow.\n   - Exploring ways to calculate line probabilities in parallel could be beneficial.\n2. Lack of Data for Training and Testing.\n   - Our dataset is constructed from CodeQL's public repository and sampled DARPA's challenges.\n   - Some data did not fit well during tuning.\n   - The size of dataset was insufficient for tuning.\n   - Explore self-supervised learning methods that do not require labeled data but need to design auxiliary tasks useful for vulnerability findings.\n\n---\n\n[1] Lester, Brian, Rami Al-Rfou, and Noah Constant. \"The power of scale for parameter-efficient prompt tuning.\" arXiv preprint arXiv:2104.08691 (2021).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbetarixm%2Fcsed499ii","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbetarixm%2Fcsed499ii","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbetarixm%2Fcsed499ii/lists"}