https://github.com/adirbella37/safety-analytics-project
Final project in Safety Management: analytics and predictive modeling for occupational incidents. Includes EDA, logistic regression, Poisson/Negative Binomial with overdispersion checks, ROC/AUC, and prediction exercises.
https://github.com/adirbella37/safety-analytics-project
classification data-visualization drunk-and-drive eda logistic-regression matplotlib negative-binomial numpy occupational-safety overdispersion pandas poisson-regression python road-safety roc-auc scikit-learn seaborn statmodels
Last synced: 19 days ago
JSON representation
Final project in Safety Management: analytics and predictive modeling for occupational incidents. Includes EDA, logistic regression, Poisson/Negative Binomial with overdispersion checks, ROC/AUC, and prediction exercises.
- Host: GitHub
- URL: https://github.com/adirbella37/safety-analytics-project
- Owner: adirbella37
- Created: 2025-09-11T12:02:55.000Z (26 days ago)
- Default Branch: main
- Last Pushed: 2025-09-11T12:17:47.000Z (26 days ago)
- Last Synced: 2025-09-11T15:23:48.767Z (26 days ago)
- Topics: classification, data-visualization, drunk-and-drive, eda, logistic-regression, matplotlib, negative-binomial, numpy, occupational-safety, overdispersion, pandas, poisson-regression, python, road-safety, roc-auc, scikit-learn, seaborn, statmodels
- Language: Jupyter Notebook
- Homepage:
- Size: 1 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Safety Analytics – Final Project
Final project for a Safety Management course, covering two domains:
1) **Occupational incidents** in toy factories in China (binary outcome: accident vs. no-accident).
2) **Drunk-driving counts** on road segments (count outcome by segment).The work includes EDA, feature exploration, logistic regression, Poisson vs. Negative Binomial modeling with overdispersion diagnosis, and performance evaluation (ROC/AUC, sensitivity thresholds). All answers are organized by **Q1, Q2, …** matching the assignment.
---
## 📓 Notebooks
- `project.ipynb` – Assignment instructions (reference).
- `safety_analytics_project.ipynb` – My full solutions (answers by Q1, Q2, …).---
## ⚙️ Main Techniques
- **EDA & Visualization:** histograms, scatter matrices, bar charts.
- **Classification (Part 1):** Logistic regression, coefficients interpretation, baseline probabilities, ROC/AUC, sensitivity-driven thresholding.
- **Counts (Part 2):** Linear regression sanity checks → **Poisson GLM** → **Negative Binomial** when overdispersion is detected.
- **Model Diagnostics:** constant-variance checks, residual patterns, AIC/BIC and log-likelihood comparison, overdispersion parameter, feature removal tests (AUC impact).
- **Interpretability:** odds/odds-ratio, IRR, top drivers by features (area, time-of-day, categories).---
## 📂 Project Structure
| File/Folder | Description |
|-------------------------------|-------------|
| `project.ipynb` | Official assignment instructions (reference notebook). |
| `safety_analytics_project.ipynb` | My full solution notebook with answers (Q1, Q2, …). |
| `df_task_1_group_25.pkl` | Dataset for Part 1 – toy factories accidents (binary classification). |
| `drunk_driver_grpoup_25.pkl` | Dataset for Part 2 – drunk-driving counts (count regression). |
| `README.md` | Project documentation and overview. |---
## ▶️ How to Run
You can get this project in two ways:
**Option 1 – Using Git**
```bash
git clone https://github.com/adirbella37/safety-analytics-project.git
cd safety-analytics-project
```**Option 2 – Download as ZIP**
1. Click the green Code button at the top of this repository
2. Select Download ZIP
3. Extract the ZIP file on your computer## 📜 License
This project is licensed under the MIT License.