https://github.com/jlee9503/defense-risk-prediction
Build a machine learning pipeline that ingests defense procurement data, identifies high-risk contracts, and visualizes the results in an interactive dashboard.
https://github.com/jlee9503/defense-risk-prediction
data-analysis data-visualization exploratory-data-analysis python
Last synced: 5 months ago
JSON representation
Build a machine learning pipeline that ingests defense procurement data, identifies high-risk contracts, and visualizes the results in an interactive dashboard.
- Host: GitHub
- URL: https://github.com/jlee9503/defense-risk-prediction
- Owner: jlee9503
- License: mit
- Created: 2025-05-16T21:16:17.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-03T18:49:36.000Z (about 1 year ago)
- Last Synced: 2025-06-12T02:47:21.546Z (about 1 year ago)
- Topics: data-analysis, data-visualization, exploratory-data-analysis, python
- Language: Jupyter Notebook
- Homepage:
- Size: 1 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Defense Procurement Risk Forecasting
This project simulates a data science pipeline for identifying risk in defense acquisition contracts. Inspired by real-world use cases in government and defense analytics, it demonstrates skills in data engineering, statistical modeling, automation, and dashboard design.
---
## ๐ Problem Statement
Defense procurement often suffers from budget overruns, delays, and vendor risks. This project aims to use simulated acquisition data to build a machine learning pipeline that flags high-risk contracts based on past performance, supplier patterns, and contract size/timing.
---
## ๐งฐ Tools Used
- Python (Pandas, sqlalchemy, Pathlib)
- SQL (MS SQL Server)
- Matplotlib / Seaborn (for internal visualizations)
- Git + VS Code
- Docker (SQL Servcer Container)
---
## ๐ Pipeline Overview
1. **Data Ingestion:**
- Loads multiple data sources (CSV, JSON) simulating contract records, supplier history, and known delays.
2. **Data Cleaning + Feature Engineering:**
- Identify missing values and duplicates
- Merge datasets using `supplier_id` and `contract_id`
- Risk feature encoding (contract age, contract value per month, risk score)
3. **Exploratory Data Analysis:**
- Univariate
- Categorical: `contract_type`, `compliance_issues`
- Numeric: `contract_value_million`, `expected_duration_months`, `average_delay_days`, `delay_days`, `risk_score`, `value_per_month`, `contract_age_days`
- Bivariate
- Relationship between target variable (`risk_score`) and each feature
---
## ๐ Project Status
โ
Completed: Setup database, Explore & Clean dataset, Exploratory Data Analysis (EDA)
๐ง In Progress: Model Training and Evaluation
---
## ๐งพ License
MIT License