{"id":34502734,"url":"https://github.com/laugiov/code-safety","last_synced_at":"2026-04-25T03:37:35.330Z","repository":{"id":329831133,"uuid":"1117851833","full_name":"laugiov/code-safety","owner":"laugiov","description":"Security Engineering reference: taint analysis benchmark comparing Pysa, CodeQL \u0026 Semgrep on a controlled Django app (16 OWASP Top 10 cases). Includes CI/CD integration with SARIF, ground truth validation, and enterprise scaling patterns.","archived":false,"fork":false,"pushed_at":"2025-12-22T10:21:46.000Z","size":436,"stargazers_count":0,"open_issues_count":16,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-12-23T10:05:40.061Z","etag":null,"topics":["appsec","benchmark","cicd-security","codeql","devsecops","django","owasp","pysa","python","sarif","sast","security","semgrep","static-analysis","taint-analysis","vulnerability-detection"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/laugiov.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-16T22:58:20.000Z","updated_at":"2025-12-22T10:21:49.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/laugiov/code-safety","commit_stats":null,"previous_names":["laugiov/code-safety"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/laugiov/code-safety","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/laugiov%2Fcode-safety","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/laugiov%2Fcode-safety/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/laugiov%2Fcode-safety/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/laugiov%2Fcode-safety/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/laugiov","download_url":"https://codeload.github.com/laugiov/code-safety/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/laugiov%2Fcode-safety/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32249383,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-25T03:17:44.950Z","status":"ssl_error","status_checked_at":"2026-04-25T03:16:45.208Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["appsec","benchmark","cicd-security","codeql","devsecops","django","owasp","pysa","python","sarif","sast","security","semgrep","static-analysis","taint-analysis","vulnerability-detection"],"created_at":"2025-12-24T02:19:41.293Z","updated_at":"2026-04-25T03:37:35.303Z","avatar_url":"https://github.com/laugiov.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Code Safety: Taint Analysis \u0026 SAST Benchmark\n\n[![CI](https://github.com/laugiov/code-safety/actions/workflows/ci.yml/badge.svg)](https://github.com/laugiov/code-safety/actions/workflows/ci.yml)\n[![Semgrep](https://github.com/laugiov/code-safety/actions/workflows/semgrep-analysis.yml/badge.svg)](https://github.com/laugiov/code-safety/actions/workflows/semgrep-analysis.yml)\n[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)\n[![Django 4.2](https://img.shields.io/badge/django-4.2-green.svg)](https://www.djangoproject.com/)\n[![License: MIT](https://img.shields.io/badge/license-MIT-yellow.svg)](LICENSE)\n\n\u003e **Lab only:** VulnShop is intentionally vulnerable. Do not deploy to production.\n\n## Context\n\nI built this project to explore how modern SAST tools handle taint analysis in practice. The idea was simple: create a controlled Django app with known vulnerabilities, then run Pysa, CodeQL, and Semgrep against it to see what they catch and what they miss.\n\nThe result is a benchmark with 16 OWASP Top 10 cases, each with documented taint flows (source, sink, sanitizer). This lets me measure detection rates objectively and understand where each tool shines or falls short.\n\n## What I wanted to demonstrate\n\nThis repo reflects work I do in AppSec and Security Engineering: writing custom taint-analysis rules, integrating SAST into CI pipelines with SARIF output, and thinking about false positive management at scale. The `docs/` folder covers enterprise patterns like governance and rollout strategies.\n\nIf you're using the project, start with the Semgrep rules in `analysis/semgrep/rules/` and the ground truth definitions in `benchmarks/ground-truth/`. Run `make analyze-semgrep` to see the full detection pipeline.\n\n## The benchmark app (VulnShop)\n\nVulnShop is a Django e-commerce app I wrote specifically for this benchmark. It has 16 security cases:\n\n| # | Case | CWE | Location |\n|---|------|-----|----------|\n| 1 | SQL Injection (Auth) | CWE-89 | `authentication/views.py` |\n| 2 | SQL Injection (Search) | CWE-89 | `catalog/views.py` |\n| 3 | XSS Reflected | CWE-79 | `catalog/views.py` |\n| 4 | XSS Stored | CWE-79 | `reviews/views.py` |\n| 5 | Command Injection | CWE-78 | `admin_panel/views.py` |\n| 6 | Path Traversal | CWE-22 | `admin_panel/views.py` |\n| 7 | IDOR | CWE-639 | `profile/views.py` |\n| 8 | Mass Assignment | CWE-915 | `profile/views.py` |\n| 9 | SSRF | CWE-918 | `webhooks/views.py` |\n| 10 | Insecure Deserialization | CWE-502 | `cart/views.py` |\n| 11 | SSTI | CWE-1336 | `notifications/views.py` |\n| 12 | Hardcoded Secrets | CWE-798 | `settings.py` |\n| 13 | Vulnerable Dependencies | CWE-1035 | `requirements.txt` |\n| 14 | Sensitive Data Logging | CWE-532 | `middleware/logging.py` |\n| 15 | XXE | CWE-611 | `api/views.py` |\n| 16 | Missing Rate Limiting | CWE-307 | `authentication/views.py` |\n\nEach vulnerability has a documented taint flow showing how user input reaches a dangerous sink, and what sanitization would prevent it.\n\n## Tools and results\n\nI tested three tools with different approaches: Pysa (Meta) does deep taint tracking, CodeQL (GitHub) offers semantic analysis with its own query language, and Semgrep is fast pattern matching suited for CI gates.\n\n| Tool | Analysis type | Speed | Best for |\n|------|--------------|-------|----------|\n| Pysa | Taint tracking | Medium | Complex data flows |\n| CodeQL | Semantic queries | Slow | Deep analysis |\n| Semgrep | Pattern matching | Fast | CI/CD integration |\n\nSemgrep is fully validated: 92 custom rules, 226 findings, 81.25% detection rate on the 16 cases. It catches SQLi, XSS, Command Injection, Path Traversal, IDOR, SSRF, Deserialization, SSTI, and XXE reliably. Mass Assignment is partial. Rate limiting and dependency checks are out of scope for SAST.\n\nPysa and CodeQL configs are ready but need their respective runtimes (Pyre and CodeQL CLI). I also included CVE reproductions for CVE-2023-36414 and CVE-2022-34265 (Django SQL injection patterns).\n\n## Running it\n\n```bash\ngit clone https://github.com/laugiov/code-safety.git\ncd code-safety\n\n# Docker: VulnShop on :8000, docs on :8080\ndocker-compose up -d\n\n# Or run Semgrep directly\nmake analyze-semgrep\n```\n\n## Project structure\n\nThe repo is organized around three main areas: the vulnerable app itself, the analysis configurations, and the benchmark data with ground truth.\n\n```\nvulnerable-app/     # VulnShop Django app\nanalysis/           # Pysa, CodeQL, Semgrep configs\nbenchmarks/         # Ground truth and results\ndocs/               # MkDocs documentation (40+ pages)\n.github/workflows/  # CI templates with SARIF\n```\n\nThe CI workflows upload SARIF to GitHub's Security tab for centralized tracking.\n\n## Documentation\n\nRun `pip install mkdocs-material \u0026\u0026 mkdocs serve` to browse locally. The docs cover taint analysis theory, tool-specific guides, and enterprise topics like scaling SAST and managing false positives.\n\n## License\n\nMIT. Contributions welcome, see [CONTRIBUTING.md](CONTRIBUTING.md).\n\n## Author\n\nLaurent Giovannoni - [@laugiov](https://github.com/laugiov)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flaugiov%2Fcode-safety","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flaugiov%2Fcode-safety","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flaugiov%2Fcode-safety/lists"}