{"id":49223291,"url":"https://github.com/sungmoon2/hlf_chaincode_vulndetect_locallm","last_synced_at":"2026-04-24T05:04:22.351Z","repository":{"id":339939305,"uuid":"1163915335","full_name":"sungmoon2/HLF_Chaincode_VulnDetect_LocalLM","owner":"sungmoon2","description":"Local sLM-based vulnerability detection for Hyperledger Fabric chaincode (Go). AMLDS 2026.","archived":false,"fork":false,"pushed_at":"2026-03-30T03:22:50.000Z","size":3377,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-30T05:55:01.045Z","etag":null,"topics":["chaincode","golang","hyperledger-fabric","llm","privacy-preserving","small-language-model","smart-contract-security","vulnerability-detection"],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sungmoon2.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-22T11:01:54.000Z","updated_at":"2026-03-30T03:22:57.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/sungmoon2/HLF_Chaincode_VulnDetect_LocalLM","commit_stats":null,"previous_names":["sungmoon2/hlf_chaincode_vulndetect_locallm"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/sungmoon2/HLF_Chaincode_VulnDetect_LocalLM","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sungmoon2%2FHLF_Chaincode_VulnDetect_LocalLM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sungmoon2%2FHLF_Chaincode_VulnDetect_LocalLM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sungmoon2%2FHLF_Chaincode_VulnDetect_LocalLM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sungmoon2%2FHLF_Chaincode_VulnDetect_LocalLM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sungmoon2","download_url":"https://codeload.github.com/sungmoon2/HLF_Chaincode_VulnDetect_LocalLM/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sungmoon2%2FHLF_Chaincode_VulnDetect_LocalLM/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32209897,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-24T03:15:14.334Z","status":"ssl_error","status_checked_at":"2026-04-24T03:15:11.608Z","response_time":64,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chaincode","golang","hyperledger-fabric","llm","privacy-preserving","small-language-model","smart-contract-security","vulnerability-detection"],"created_at":"2026-04-24T05:04:01.457Z","updated_at":"2026-04-24T05:04:22.341Z","avatar_url":"https://github.com/sungmoon2.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# HLF Chaincode Vulnerability Detection with Local sLM\n\n**Privacy-Preserving Anomaly Detection in Hyperledger Fabric Chaincode Using Compact Local Transformer Models**\n\nSubmitted to **AANN 2026** (6th International Conference on Advanced Algorithms and Neural Networks, Qingdao, China, August 7-9, 2026) | Paper No: M7VNBFDSWP\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)\n\n---\n\n## Overview\n\nThis repository contains experiment artifacts for detecting **endorsement-nondeterminism vulnerabilities** in Hyperledger Fabric (HLF) Go chaincode using a locally deployed compact transformer model (Qwen2.5-Coder-7B, 4-bit quantized) alongside custom Semgrep rules. The 464-file benchmark is derived from the GoLiSA corpus with content-hash deduplication and dual-annotator verification.\n\n## Key Results (464-File GoLiSA Benchmark)\n\n| Method | TPR | TNR | Prec. | F1 |\n|:-------|:----|:----|:------|:---|\n| Qwen2.5-Coder-7B (det.) | 80.6% (25/31) | 48.6% (210/432) | 10.1% | 18.0% |\n| Semgrep (5 custom rules) | 77.4% (24/31) | 99.3% (429/432) | 88.9% | 82.8% |\n| Majority vote (5 seeds) | 90.6% (29/32) | 45.8% (198/432) | 11.0% | 19.7% |\n| OR-Union | 96.8% (30/31) | 48.4% (209/432) | 11.9% | 21.1% |\n\n*Deterministic results on 463 common files (31V, 432S). Majority on 464 files (32V, 432S). Union on 463 common files.*\n\n### Supplementary: 15-File Diagnostic Benchmark\n\n| Model | Type | TPR (9 vuln) | TNR (6 safe) | Avg Time/File |\n|:------|:-----|:-------------|:-------------|:--------------|\n| **Qwen2.5-Coder-7B** | Local (7B) | 9/9 (100%) | 6/6 (100%) | 3.94s |\n| Llama-3.1-8B | Local (8B) | 9/9 (100%) | 1/6 (17%) | 10.09s |\n| Claude Haiku 4.5 | Cloud | 9/9 (100%) | 5/6 (83%) | 12.89s |\n| Gemini 2.5 Pro | Cloud | 9/9 (100%) | 0/6 (0%) | 19.63s |\n\n## Repository Structure\n\n```\n.\n├── scripts/                        # 29 experiment scripts\n│   ├── 01_download_models.py       # Model download (HuggingFace)\n│   ├── 02_run_audit_v3.py          # Multi-prompt, multi-model audit\n│   ├── 03_obfuscate_dataset.py     # Identifier obfuscation (459 replacements)\n│   ├── 04~06_*.py                  # Cloud API audits (Claude, Gemini)\n│   ├── 07~09_*.py                  # GoLiSA validation, reclassification\n│   ├── 10~19_*.py                  # Microbenchmark, repeat, CoT experiments\n│   ├── 20_run_addon_validation.py  # Addon dataset validation (D1/D2)\n│   ├── 21_run_annotation_ablation.py # Annotation ablation study\n│   ├── 22_mine_golisa_candidates.py  # GoLiSA positive/negative mining\n│   ├── 23_prompt_dev_sanity_check.py # Prompt development sanity check\n│   ├── 24_run_golisa_labeling.py   # 464-file annotation pipeline\n│   ├── 25_run_second_annotation.py # Independent second annotation\n│   ├── 26_run_main_experiment.py   # Phase 6: deterministic evaluation\n│   ├── 27_run_robustness.py        # Phase 7: 5-seed robustness (v2.0)\n│   └── strip_go_comments.go        # Go comment stripper (source)\n│\n├── 02_resources/\n│   ├── dataset/                    # 15 Go chaincodes (vuln 9 + safe 6)\n│   ├── dataset_obfuscated/         # 15 obfuscated Go files\n│   ├── models/                     # .gguf files (excluded via .gitignore)\n│   └── golisa_benchmark/           # 657 Go files from 326 GitHub repos\n│\n├── 06_addon_validation/            # 464-file benchmark pipeline\n│   ├── benchmark/                  # BENCHMARK_FREEZE.json (ground truth)\n│   │                               # INFERENCE_CONTRACT.md (parameters)\n│   ├── dataset/                    # 17 addon .go files\n│   ├── dataset_d1_clean/           # 15 annotation-stripped .go files\n│   ├── dataset_ablation_*/         # Ablation datasets (ann/abl)\n│   ├── golisa_mining/              # Candidate mining data\n│   ├── labeling/                   # Primary + secondary annotation data\n│   │   ├── run_260422_2142/        # Primary annotation (per-file JSON)\n│   │   ├── second_260423_*/        # Second annotation runs\n│   │   └── verification/           # Manual verification results\n│   ├── experiment/\n│   │   ├── main_260423_0047/       # Phase 6 deterministic (463 per-file)\n│   │   └── robustness_260423_0341/ # Phase 7 robustness (464 x 5 seeds)\n│   └── results/                    # Summary CSVs and reports\n│\n├── 03_artifacts/raw_results/       # CSV audit results + meta.json\n├── 04_feedback/                    # Issue tracking\n├── 01_contexts/                    # Session tracking, references\n│\n├── rules/hlf_consensus.yml         # 5 custom Semgrep rules for HLF\n├── PROMPTS.md                      # Prompt templates (P1-P4) verbatim\n├── CLASSIFIER.md                   # Classifier v1/v2/JSON logic\n├── LABELING_CRITERIA.md            # Ground truth labels + criteria\n├── PIPELINE_WORKFLOW.md            # Experiment pipeline description\n├── REPRODUCTION.md                 # Step-by-step reproduction guide\n├── requirements.txt                # Python dependencies (version-pinned)\n├── CITATION.cff                    # Citation metadata\n├── LICENSE                         # MIT License\n└── .gitignore                      # Excludes models (9GB), VM images (22GB)\n```\n\n## Benchmark Construction (464-File)\n\nThe benchmark is derived from the GoLiSA corpus (657 files, 326 repos):\n1. Remove files with insufficient chaincode structure: 657 -\u003e 618\n2. Content-hash (SHA-256) deduplication: 618 -\u003e 464 (154 duplicates removed)\n3. Primary annotation (Claude Sonnet 4.5): 46 initial positives -\u003e 33 after manual verification\n4. Second annotation (Claude Opus 4.5): Cohen's kappa = 0.766 (substantial)\n5. Final benchmark: 32 vulnerable, 432 safe\n\nGround truth labels: [`06_addon_validation/benchmark/BENCHMARK_FREEZE.json`](06_addon_validation/benchmark/BENCHMARK_FREEZE.json)\n\n## Reproducibility\n\n### Prompt Strategies\n\nFour prompt strategies are documented in [`PROMPTS.md`](PROMPTS.md):\n\n| Prompt | Description |\n|:-------|:------------|\n| P1: Zero-shot | 6 vulnerability categories, structured output |\n| P2: Few-shot | P1 + 2 examples (vulnerable vs. safe `time.Now()` usage) |\n| P3: Chain-of-Thought | 6-step reasoning: PutState backward tracing |\n| P4: JSON mode | Structured JSON output with `is_vulnerable` boolean |\n\n### Classification Logic\n\nThree classifiers are documented in [`CLASSIFIER.md`](CLASSIFIER.md):\n\n| Classifier | Key Feature |\n|:-----------|:------------|\n| v1 (original) | Safe-phrase early return with contradiction check |\n| v2 (improved) | Self-contradiction detection: structured evidence overrides safe phrase |\n| JSON parser | Parses `is_vulnerable` field, falls back to v2 |\n\n### Six Targeted Vulnerability Classes\n\n| Class | Description |\n|:------|:------------|\n| C1 | Nondeterministic timestamps (`time.Now()`) |\n| C2 | Goroutine concurrency |\n| C3 | Map-iteration randomness |\n| C4 | Phantom reads (`GetQueryResult`) |\n| C5 | Iterator resource leaks (auxiliary) |\n| C6 | Global mutable state |\n\n## Hardware\n\n| Component | Specification |\n|:----------|:-------------|\n| GPU | NVIDIA GeForce RTX 3090 Ti (24564 MiB VRAM) |\n| CUDA | 13.0 (V13.0.88) |\n| Python | 3.11.9 |\n| llama-cpp-python | 0.3.16 (CUDA build) |\n| Semgrep | 1.151.0 |\n\n## Models (not included in repo)\n\n| Model | File | Size | Source |\n|:------|:-----|:-----|:-------|\n| Qwen2.5-Coder-7B-Instruct | Q4_K_M.gguf | 4.4 GB | HuggingFace |\n| Meta-Llama-3.1-8B-Instruct | Q4_K_M.gguf | 4.6 GB | HuggingFace |\n\nDownload via `scripts/01_download_models.py`.\n\n## Reproduction\n\nSee [`REPRODUCTION.md`](REPRODUCTION.md) for a step-by-step guide.\n\n## Citation\n\n```bibtex\n@inproceedings{park2026privacy,\n  title={Privacy-Preserving Anomaly Detection in Hyperledger Fabric Chaincode Using Compact Local Transformer Models},\n  author={Park, Sungmoon and Yang, Jinhong},\n  booktitle={Proceedings of the 6th International Conference on Advanced Algorithms and Neural Networks (AANN 2026)},\n  year={2026},\n  publisher={IEEE},\n  address={Qingdao, China}\n}\n```\n\n## License\n\nThis project is licensed under the [MIT License](LICENSE). The GoLiSA benchmark files in `02_resources/golisa_benchmark/` are sourced from the GoLiSA project (Olivieri et al., ECOOP 2023) and retain their original licensing.\n\n## Acknowledgments\n\nThis work was supported by the Korea Institute for Advancement of Technology (KIAT) grant funded by the Korea government (Ministry of Trade, Industry and Energy) through the International Cooperation in Industrial Technology program (Project Number: P0026190).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsungmoon2%2Fhlf_chaincode_vulndetect_locallm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsungmoon2%2Fhlf_chaincode_vulndetect_locallm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsungmoon2%2Fhlf_chaincode_vulndetect_locallm/lists"}