{"id":31133021,"url":"https://github.com/adirbella37/safety-analytics-project","last_synced_at":"2026-04-09T17:39:14.200Z","repository":{"id":314271792,"uuid":"1054841025","full_name":"adirbella37/Safety-Analytics-Project","owner":"adirbella37","description":"Final project in Safety Management: analytics and predictive modeling for occupational incidents. Includes EDA, logistic regression, Poisson/Negative Binomial with overdispersion checks, ROC/AUC, and prediction exercises.","archived":false,"fork":false,"pushed_at":"2025-09-11T12:17:47.000Z","size":1050,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-11T15:23:48.767Z","etag":null,"topics":["classification","data-visualization","drunk-and-drive","eda","logistic-regression","matplotlib","negative-binomial","numpy","occupational-safety","overdispersion","pandas","poisson-regression","python","road-safety","roc-auc","scikit-learn","seaborn","statmodels"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/adirbella37.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-11T12:02:55.000Z","updated_at":"2025-09-11T12:17:50.000Z","dependencies_parsed_at":"2025-09-11T15:24:01.276Z","dependency_job_id":"a7959668-4dd9-49f1-8aee-65dee01d3a2e","html_url":"https://github.com/adirbella37/Safety-Analytics-Project","commit_stats":null,"previous_names":["adirbella37/safety-analytics-project.ipynb"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/adirbella37/Safety-Analytics-Project","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adirbella37%2FSafety-Analytics-Project","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adirbella37%2FSafety-Analytics-Project/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adirbella37%2FSafety-Analytics-Project/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adirbella37%2FSafety-Analytics-Project/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/adirbella37","download_url":"https://codeload.github.com/adirbella37/Safety-Analytics-Project/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adirbella37%2FSafety-Analytics-Project/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":275712384,"owners_count":25514205,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-18T02:00:09.552Z","response_time":77,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["classification","data-visualization","drunk-and-drive","eda","logistic-regression","matplotlib","negative-binomial","numpy","occupational-safety","overdispersion","pandas","poisson-regression","python","road-safety","roc-auc","scikit-learn","seaborn","statmodels"],"created_at":"2025-09-18T05:02:52.693Z","updated_at":"2025-09-18T05:04:10.512Z","avatar_url":"https://github.com/adirbella37.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Safety Analytics – Final Project\n\nFinal project for a Safety Management course, covering two domains:\n1) **Occupational incidents** in toy factories in China (binary outcome: accident vs. no-accident).\n2) **Drunk-driving counts** on road segments (count outcome by segment).\n\nThe work includes EDA, feature exploration, logistic regression, Poisson vs. Negative Binomial modeling with overdispersion diagnosis, and performance evaluation (ROC/AUC, sensitivity thresholds). All answers are organized by **Q1, Q2, …** matching the assignment.\n\n---\n\n## 📓 Notebooks\n- `project.ipynb` – Assignment instructions (reference).\n- `safety_analytics_project.ipynb` – My full solutions (answers by Q1, Q2, …).\n\n---\n\n## ⚙️ Main Techniques\n- **EDA \u0026 Visualization:** histograms, scatter matrices, bar charts.\n- **Classification (Part 1):** Logistic regression, coefficients interpretation, baseline probabilities, ROC/AUC, sensitivity-driven thresholding.\n- **Counts (Part 2):** Linear regression sanity checks → **Poisson GLM** → **Negative Binomial** when overdispersion is detected.\n- **Model Diagnostics:** constant-variance checks, residual patterns, AIC/BIC and log-likelihood comparison, overdispersion parameter, feature removal tests (AUC impact).\n- **Interpretability:** odds/odds-ratio, IRR, top drivers by features (area, time-of-day, categories).\n\n---\n\n## 📂 Project Structure\n\n| File/Folder                  | Description |\n|-------------------------------|-------------|\n| `project.ipynb`              | Official assignment instructions (reference notebook). |\n| `safety_analytics_project.ipynb` | My full solution notebook with answers (Q1, Q2, …). |\n| `df_task_1_group_25.pkl`     | Dataset for Part 1 – toy factories accidents (binary classification). |\n| `drunk_driver_grpoup_25.pkl` | Dataset for Part 2 – drunk-driving counts (count regression). |\n| `README.md`                  | Project documentation and overview. |\n\n---\n\n## ▶️ How to Run\n\nYou can get this project in two ways:\n\n**Option 1 – Using Git**\n\n```bash\ngit clone https://github.com/adirbella37/safety-analytics-project.git\ncd safety-analytics-project\n```\n\n**Option 2 – Download as ZIP**\n\n  1. Click the green Code button at the top of this repository\n  2. Select Download ZIP\n  3. Extract the ZIP file on your computer\n\n## 📜 License\nThis project is licensed under the MIT License.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadirbella37%2Fsafety-analytics-project","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fadirbella37%2Fsafety-analytics-project","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadirbella37%2Fsafety-analytics-project/lists"}