An open API service indexing awesome lists of open source software.

https://github.com/adirbella37/safety-analytics-project

Final project in Safety Management: analytics and predictive modeling for occupational incidents. Includes EDA, logistic regression, Poisson/Negative Binomial with overdispersion checks, ROC/AUC, and prediction exercises.
https://github.com/adirbella37/safety-analytics-project

classification data-visualization drunk-and-drive eda logistic-regression matplotlib negative-binomial numpy occupational-safety overdispersion pandas poisson-regression python road-safety roc-auc scikit-learn seaborn statmodels

Last synced: 19 days ago
JSON representation

Final project in Safety Management: analytics and predictive modeling for occupational incidents. Includes EDA, logistic regression, Poisson/Negative Binomial with overdispersion checks, ROC/AUC, and prediction exercises.

Awesome Lists containing this project

README

          

# Safety Analytics – Final Project

Final project for a Safety Management course, covering two domains:
1) **Occupational incidents** in toy factories in China (binary outcome: accident vs. no-accident).
2) **Drunk-driving counts** on road segments (count outcome by segment).

The work includes EDA, feature exploration, logistic regression, Poisson vs. Negative Binomial modeling with overdispersion diagnosis, and performance evaluation (ROC/AUC, sensitivity thresholds). All answers are organized by **Q1, Q2, …** matching the assignment.

---

## 📓 Notebooks
- `project.ipynb` – Assignment instructions (reference).
- `safety_analytics_project.ipynb` – My full solutions (answers by Q1, Q2, …).

---

## ⚙️ Main Techniques
- **EDA & Visualization:** histograms, scatter matrices, bar charts.
- **Classification (Part 1):** Logistic regression, coefficients interpretation, baseline probabilities, ROC/AUC, sensitivity-driven thresholding.
- **Counts (Part 2):** Linear regression sanity checks → **Poisson GLM** → **Negative Binomial** when overdispersion is detected.
- **Model Diagnostics:** constant-variance checks, residual patterns, AIC/BIC and log-likelihood comparison, overdispersion parameter, feature removal tests (AUC impact).
- **Interpretability:** odds/odds-ratio, IRR, top drivers by features (area, time-of-day, categories).

---

## 📂 Project Structure

| File/Folder | Description |
|-------------------------------|-------------|
| `project.ipynb` | Official assignment instructions (reference notebook). |
| `safety_analytics_project.ipynb` | My full solution notebook with answers (Q1, Q2, …). |
| `df_task_1_group_25.pkl` | Dataset for Part 1 – toy factories accidents (binary classification). |
| `drunk_driver_grpoup_25.pkl` | Dataset for Part 2 – drunk-driving counts (count regression). |
| `README.md` | Project documentation and overview. |

---

## ▶️ How to Run

You can get this project in two ways:

**Option 1 – Using Git**

```bash
git clone https://github.com/adirbella37/safety-analytics-project.git
cd safety-analytics-project
```

**Option 2 – Download as ZIP**

1. Click the green Code button at the top of this repository
2. Select Download ZIP
3. Extract the ZIP file on your computer

## 📜 License
This project is licensed under the MIT License.