{"id":51035207,"url":"https://github.com/scthornton/securecode","last_synced_at":"2026-06-22T05:01:15.614Z","repository":{"id":362577521,"uuid":"1154126447","full_name":"scthornton/securecode","owner":"scthornton","description":"Unified security training dataset (2,185 examples) covering OWASP Top 10 2021 and OWASP LLM Top 10 2025","archived":false,"fork":false,"pushed_at":"2026-03-25T14:36:41.000Z","size":6,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-04T23:23:00.134Z","etag":null,"topics":["ai-security","huggingface","owasp","secure-coding","security-dataset","training-data","web-security"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/scthornton.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-10T03:23:28.000Z","updated_at":"2026-04-23T18:21:37.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/scthornton/securecode","commit_stats":null,"previous_names":["scthornton/securecode"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/scthornton/securecode","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scthornton%2Fsecurecode","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scthornton%2Fsecurecode/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scthornton%2Fsecurecode/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scthornton%2Fsecurecode/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/scthornton","download_url":"https://codeload.github.com/scthornton/securecode/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scthornton%2Fsecurecode/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34635038,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-22T02:00:06.391Z","response_time":106,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-security","huggingface","owasp","secure-coding","security-dataset","training-data","web-security"],"created_at":"2026-06-22T05:01:14.760Z","updated_at":"2026-06-22T05:01:15.599Z","avatar_url":"https://github.com/scthornton.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SecureCode\n\n**Comprehensive security training dataset for AI coding assistants — 2,185 examples covering both traditional web security and AI/ML security.**\n\nBuilt by [perfecXion.ai](https://perfecxion.ai).\n\n## Dataset Family\n\n| Dataset | Examples | Focus | HuggingFace | GitHub |\n|---------|----------|-------|-------------|--------|\n| **SecureCode** | 2,185 | Unified (web + AI/ML) | [scthornton/securecode](https://huggingface.co/datasets/scthornton/securecode) | This repo |\n| SecureCode v2 | 1,435 | Web security (OWASP Top 10 2021) | [scthornton/securecode-v2](https://huggingface.co/datasets/scthornton/securecode-v2) | [securecode-v2](https://github.com/scthornton/securecode-v2) |\n| SecureCode AI/ML | 750 | AI/ML security (OWASP LLM Top 10 2025) | [scthornton/securecode-aiml](https://huggingface.co/datasets/scthornton/securecode-aiml) | [securecode-aiml](https://github.com/scthornton/securecode-aiml) |\n\n## Quick Start\n\n```python\nfrom datasets import load_dataset\n\n# Load everything (2,185 examples)\ndataset = load_dataset(\"scthornton/securecode\")\n\n# Load only web security (1,435 examples)\nweb = load_dataset(\"scthornton/securecode\", \"web\")\n\n# Load only AI/ML security (750 examples)\naiml = load_dataset(\"scthornton/securecode\", \"aiml\")\n```\n\n## What's In It\n\nEvery example is a 4-turn conversation between a developer and an AI coding assistant. The developer asks how to build something, and the assistant provides a vulnerable implementation, explains why it's dangerous, shows a secure alternative with 5+ defense layers, and then covers testing, monitoring, and common mistakes.\n\n**Web Security (1,435 examples):** SQL injection, XSS, authentication bypass, SSRF, cryptographic failures, and more across 12 programming languages and 9 web frameworks (Express.js, Django, Spring Boot, Flask, Rails, Laravel, ASP.NET Core, FastAPI, NestJS).\n\n**AI/ML Security (750 examples):** Prompt injection, model poisoning, embedding manipulation, system prompt leakage, excessive agent autonomy, and more across 30+ AI/ML frameworks (LangChain, OpenAI, Anthropic, HuggingFace, LlamaIndex, ChromaDB, vLLM, CrewAI, AutoGen, etc.).\n\n## Unified Schema\n\nAll conversations use a normalized `{role, content}` format:\n\n```json\n{\n  \"id\": \"example-id\",\n  \"metadata\": { \"category\": \"...\", \"severity\": \"CRITICAL\", \"cwe\": \"CWE-79\", \"lang\": \"python\" },\n  \"context\": { \"description\": \"...\", \"impact\": \"...\" },\n  \"conversations\": [\n    {\"role\": \"human\", \"content\": \"How do I build secure JWT auth?\"},\n    {\"role\": \"assistant\", \"content\": \"Here's the vulnerable version... here's the secure version...\"},\n    {\"role\": \"human\", \"content\": \"How do I test this?\"},\n    {\"role\": \"assistant\", \"content\": \"Here's how to test, monitor, and avoid common mistakes...\"}\n  ],\n  \"quality_score\": null,\n  \"security_assertions\": [],\n  \"references\": []\n}\n```\n\n## Building the Unified Dataset\n\nThe unified dataset is built from the two source datasets using a normalization script that converts v2.x conversations from `{turn, from, value}` to `{role, content}` format.\n\n```bash\npython3 scripts/build_unified_dataset.py\n```\n\nThis generates `unified-data/data/web/` (1,435 files) and `unified-data/data/aiml/` (750 files), ready to push to HuggingFace.\n\n## Configs\n\n| Config | Examples | OWASP Standard |\n|--------|----------|----------------|\n| `default` | 2,185 | Both |\n| `web` | 1,435 | OWASP Top 10 2021 |\n| `aiml` | 750 | OWASP LLM Top 10 2025 |\n\n## Citation\n\n```bibtex\n@misc{thornton2025securecode,\n  title={SecureCode v2.0: A Production-Grade Dataset for Training Security-Aware Code Generation Models},\n  author={Thornton, Scott},\n  year={2025},\n  publisher={perfecXion.ai},\n  url={https://huggingface.co/datasets/scthornton/securecode-v2},\n  note={arXiv:2512.18542}\n}\n\n@dataset{thornton2026securecodeaiml,\n  title={SecureCode AI/ML: AI/ML Security Training Dataset for the OWASP LLM Top 10 2025},\n  author={Thornton, Scott},\n  year={2026},\n  publisher={perfecXion.ai},\n  url={https://huggingface.co/datasets/scthornton/securecode-aiml}\n}\n```\n\n## License\n\n- **Web examples:** CC BY-NC-SA 4.0\n- **AI/ML examples:** MIT\n- **Unified dataset:** CC BY-NC-SA 4.0 (the more restrictive of the two)\n\n---\n\n## Contact\n\n**Scott Thornton** — AI Security Researcher\n\n- Website: [perfecxion.ai](https://perfecxion.ai/)\n- Email: [scott@perfecxion.ai](mailto:scott@perfecxion.ai)\n- LinkedIn: [linkedin.com/in/scthornton](https://www.linkedin.com/in/scthornton)\n- ORCID: [0009-0008-0491-0032](https://orcid.org/0009-0008-0491-0032)\n- GitHub: [@scthornton](https://github.com/scthornton)\n\n**Security Issues**: Please report via [SECURITY.md](SECURITY.md)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscthornton%2Fsecurecode","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fscthornton%2Fsecurecode","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscthornton%2Fsecurecode/lists"}