An open API service indexing awesome lists of open source software.

https://github.com/jlee9503/defense-risk-prediction

Build a machine learning pipeline that ingests defense procurement data, identifies high-risk contracts, and visualizes the results in an interactive dashboard.
https://github.com/jlee9503/defense-risk-prediction

data-analysis data-visualization exploratory-data-analysis python

Last synced: 5 months ago
JSON representation

Build a machine learning pipeline that ingests defense procurement data, identifies high-risk contracts, and visualizes the results in an interactive dashboard.

Awesome Lists containing this project

README

          

# Defense Procurement Risk Forecasting

This project simulates a data science pipeline for identifying risk in defense acquisition contracts. Inspired by real-world use cases in government and defense analytics, it demonstrates skills in data engineering, statistical modeling, automation, and dashboard design.

---

## ๐Ÿ” Problem Statement

Defense procurement often suffers from budget overruns, delays, and vendor risks. This project aims to use simulated acquisition data to build a machine learning pipeline that flags high-risk contracts based on past performance, supplier patterns, and contract size/timing.

---

## ๐Ÿงฐ Tools Used

- Python (Pandas, sqlalchemy, Pathlib)
- SQL (MS SQL Server)
- Matplotlib / Seaborn (for internal visualizations)
- Git + VS Code
- Docker (SQL Servcer Container)

---

## ๐Ÿ“ˆ Pipeline Overview

1. **Data Ingestion:**
- Loads multiple data sources (CSV, JSON) simulating contract records, supplier history, and known delays.

2. **Data Cleaning + Feature Engineering:**
- Identify missing values and duplicates
- Merge datasets using `supplier_id` and `contract_id`
- Risk feature encoding (contract age, contract value per month, risk score)

3. **Exploratory Data Analysis:**
- Univariate
- Categorical: `contract_type`, `compliance_issues`
- Numeric: `contract_value_million`, `expected_duration_months`, `average_delay_days`, `delay_days`, `risk_score`, `value_per_month`, `contract_age_days`
- Bivariate
- Relationship between target variable (`risk_score`) and each feature

---

## ๐Ÿ“ Project Status

โœ… Completed: Setup database, Explore & Clean dataset, Exploratory Data Analysis (EDA)

๐Ÿšง In Progress: Model Training and Evaluation

---
## ๐Ÿงพ License

MIT License