{"id":18850732,"url":"https://github.com/typeerror/vuln-data-science","last_synced_at":"2025-04-14T09:40:22.172Z","repository":{"id":245736542,"uuid":"808947869","full_name":"TypeError/vuln-data-science","owner":"TypeError","description":"Advanced vulnerability management and analysis through data science techniques","archived":false,"fork":false,"pushed_at":"2025-02-09T13:05:14.000Z","size":2889,"stargazers_count":8,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-27T22:51:17.543Z","etag":null,"topics":["cybersecurity","exploit-prediction","risk-management","security-analysis","vulnerability-analysis","vulnerability-management","vulnerability-research"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TypeError.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-01T08:41:11.000Z","updated_at":"2025-02-09T13:05:17.000Z","dependencies_parsed_at":"2024-06-28T11:27:43.014Z","dependency_job_id":"fd3c2f6e-62ef-4e79-948f-36e49890c3de","html_url":"https://github.com/TypeError/vuln-data-science","commit_stats":null,"previous_names":["cak/vulnerability-data-science","cak/vuln-data-science","typeerror/vuln-data-science"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TypeError%2Fvuln-data-science","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TypeError%2Fvuln-data-science/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TypeError%2Fvuln-data-science/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TypeError%2Fvuln-data-science/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TypeError","download_url":"https://codeload.github.com/TypeError/vuln-data-science/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248855480,"owners_count":21172568,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cybersecurity","exploit-prediction","risk-management","security-analysis","vulnerability-analysis","vulnerability-management","vulnerability-research"],"created_at":"2024-11-08T03:30:57.534Z","updated_at":"2025-04-14T09:40:22.165Z","avatar_url":"https://github.com/TypeError.png","language":"Jupyter Notebook","readme":"# vuln-data-science\n\n![MIT License](https://img.shields.io/badge/License-MIT-yellow.svg)\n![Python Version](https://img.shields.io/badge/Python-3.11%2B-blue.svg)\n\nWelcome to the vuln-data-science repository! This project focuses on applying data science techniques to vulnerability\nmanagement and analysis. Our goal is to explore, analyze, and share insights on vulnerabilities using data science\nmethodologies.\n\n## Table of Contents\n\n- [Introduction](#introduction)\n- [Motivation](#motivation)\n- [Features](#features)\n- [Getting Started](#getting-started)\n    - [Prerequisites](#prerequisites)\n    - [Installation](#installation)\n- [Usage](#usage)\n- [Project Structure](#project-structure)\n- [Notebooks and Markdown](#notebooks-and-markdown)\n- [Contributing](#contributing)\n- [License](#license)\n- [Contact](#contact)\n- [Future Work](#future-work)\n- [Acknowledgments](#acknowledgments)\n\n## Introduction\n\nIn the modern cybersecurity landscape, vulnerability management is crucial. By leveraging data science, we can gain\ndeeper insights into vulnerabilities, predict trends, and enhance our overall security posture. This repository contains\ndata, Jupyter notebooks, and analysis scripts aimed at advancing our understanding of vulnerabilities across various\ndomains, including software and network vulnerabilities. We utilize data from trusted sources such as:\n\n- [CISA Known Exploited Vulnerabilities (KEV)](https://www.cisa.gov/known-exploited-vulnerabilities-catalog)\n- [Exploit Prediction Scoring System (EPSS)](https://www.first.org/epss/)\n- [Microsoft Security Update Guide](https://msrc.microsoft.com/update-guide)\n- [NIST National Vulnerability Database (NVD)](https://nvd.nist.gov/)\n\n## Motivation\n\nEffective vulnerability management is essential for maintaining a strong security posture. This project demonstrates how\ndata science can be used to identify patterns, predict vulnerabilities, and provide actionable insights to security\nprofessionals.\n\n## Features\n\n- **Data Collection**: Automated scripts for collecting vulnerability data from various sources.\n- **Data Cleaning**: Techniques to preprocess and clean the data for analysis.\n- **Exploratory Data Analysis**: Visualizations and insights into vulnerability trends.\n- **Predictive Analysis**: Models to predict future vulnerabilities and their potential impact.\n- **Tools \u0026 Libraries**: Utilization of tools like Pandas, Matplotlib, Seaborn, and Scikit-learn for data processing and\n  analysis.\n\n## Getting Started\n\n### Prerequisites\n\nBefore you begin, ensure you have the following software installed:\n\n- Python 3.11 or higher\n\n### Installation\n\n1. Clone the repository:\n\n   ```bash\n   git clone https://github.com/typeerror/vuln-data-science.git\n   ```\n\n2. Navigate to the project directory:\n\n   ```bash\n   cd vuln-data-science\n   ```\n\n3. Create a virtual environment:\n\n   ```bash\n   python -m venv .venv\n   ```\n\n4. Activate the virtual environment:\n\n    - On Windows:\n      ```bash\n      venv\\Scripts\\activate\n      ```\n    - On macOS and Linux:\n      ```bash\n      source .venv/bin/activate\n      ```\n\n5. Install the required dependencies:\n\n   ```bash\n   pip install .\n   ```\n\n   Alternatively, if you use Hatch, you can set up the environment with:\n\n   ```bash\n   hatch env create\n   hatch shell\n   ```\n\n## Usage\n\nTo start exploring the data and running the analyses, open the Jupyter notebooks in the `notebooks` directory. Each\nnotebook focuses on a different aspect of the data pipeline.\n\nYou can launch Jupyter Notebook with the following command:\n\n```bash\njupyter notebook\n```\n\nNavigate to the `notebooks` directory and open any notebook to get started.\n\n## Project Structure\n\n```\nvuln-data-science/\n├── data/\n├── notebooks/\n├── scripts/\n│   ├── nb_to_md.py\n├── README.md\n└── LICENSE\n```\n\n## Contributing\n\nWe welcome contributions! If you have ideas or find issues, please open a GitHub issue or submit a pull request.\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.\n\n## Contact\n\nFor questions or suggestions, reach out via GitHub issues, email\nat [projects@typeerror.com](mailto:projects@typeerror.com), or connect with Caleb\non [LinkedIn](https://linkedin.com/in/calebk).\n\n## Future Work\n\nWe plan to expand the project with the following features:\n\n- **Additional Data Sources**: Integration with more vulnerability databases and threat intelligence feeds.\n- **Advanced Analytics**: Machine learning models for predicting vulnerability exploitation likelihood.\n- **Visualization Dashboards**: Interactive dashboards for visualizing trends and insights.\n\n### Data Usage and Attribution\n\nThis project uses data from various publicly available sources. Please ensure compliance with their respective usage\nagreements and attribution requirements if you use or redistribute the data.\n\n#### **NIST National Vulnerability Database (NVD)**\n\n- Website: [NVD Developers - Terms of Use](https://nvd.nist.gov/developers/terms-of-use)\n- **Attribution Requirement**:\n    - Services utilizing the NVD API must display the following notice prominently:\n      \u003e \"This product uses the NVD API but is not endorsed or certified by the NVD.\"\n    - The NVD name may only be used to identify the source of API content and may not imply endorsement of any product\n      or service.\n\n#### **CISA Known Exploited Vulnerabilities (KEV)**\n\n- Website: [CISA KEV License](https://www.cisa.gov/sites/default/files/licenses/kev/license.txt)\n- **License**:\n    - The KEV database is distributed under the **Creative Commons 0 1.0 License**.\n    - You may use this data in any legal manner, but note:\n        - Information provided at any 3rd-party links included in the KEV database is bound by the policies and licenses\n          of those third-party websites.\n        - Use of the information does not authorize you to use the **CISA Logo** or **DHS Seal**, nor should such use be\n          interpreted as an endorsement by CISA or DHS.\n\n#### **Exploit Prediction Scoring System (EPSS)**\n\n- Website: [EPSS - FIRST.org](https://www.first.org/epss)\n- **Usage Agreement**:\n    - EPSS scores are freely available for public use.\n    - **Attribution Requirement**:\n      \u003e \"See EPSS at https://www.first.org/epss\"  \n      \u003e or  \n      \u003e \"Jay Jacobs, Sasha Romanosky, Benjamin Edwards, Michael Roytman, Idris Adjerid, (2021), Exploit Prediction\n      Scoring System, Digital Threats Research and Practice, 2(3).\"\n\n---\n\n### Acknowledgments\n\nWe would like to acknowledge the work of researchers and contributors who are advancing the field of vulnerability data\nscience. Their insights and tools have been instrumental in shaping this project. This project also draws inspiration\nfrom the broader cybersecurity and data science communities, whose collective efforts improve security practices and\npromote knowledge sharing.\n\n- **[Jay Jacobs](https://www.linkedin.com/in/jayjacobs1/)**  \n  Co-founder of the Cyentia Institute, focusing on security metrics and data-driven decision-making in vulnerability\n  management and risk assessment.\n\n- **[Jerry Gamblin](https://www.linkedin.com/in/jgamblin/)** / [GitHub](https://github.com/jgamblin)  \n  Security researcher and advocate, contributing to vulnerability analysis, remediation strategies, and the development\n  of security tools.\n\n- **[Patrick Garrity](https://www.linkedin.com/in/patrickmgarrity/)**  \n  Acclaimed security researcher with deep expertise in vulnerabilities, exploitation, and threat actor analysis, focused\n  on transforming complex vulnerability data into clear, actionable visualizations.\n\n- **[Wade Baker](https://www.linkedin.com/in/drwadebaker/)**  \n  Co-founder of the Cyentia Institute and co-creator of the Verizon Data Breach Investigations Report (DBIR),\n  specializing in security data analytics and risk management.\n\nWe also want to thank the broader cybersecurity and data science communities for their contributions. This project draws\ninspiration from collective efforts to improve security practices and promote knowledge sharing.\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftypeerror%2Fvuln-data-science","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftypeerror%2Fvuln-data-science","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftypeerror%2Fvuln-data-science/lists"}