https://github.com/alunera-data/animal-shelter-ml
edX Capstone Project: ML-based prediction of shelter animal outcomes using R and Random Forest
https://github.com/alunera-data/animal-shelter-ml
animal-shelter capstone caret classification eda edx machine-learning ml r randomforest tidyverse visualization
Last synced: 12 days ago
JSON representation
edX Capstone Project: ML-based prediction of shelter animal outcomes using R and Random Forest
- Host: GitHub
- URL: https://github.com/alunera-data/animal-shelter-ml
- Owner: alunera-data
- License: mit
- Created: 2025-04-11T14:17:14.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-04-17T09:31:38.000Z (about 1 year ago)
- Last Synced: 2025-04-18T00:21:47.264Z (about 1 year ago)
- Topics: animal-shelter, capstone, caret, classification, eda, edx, machine-learning, ml, r, randomforest, tidyverse, visualization
- Language: HTML
- Homepage:
- Size: 1.08 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 🐾 Animal Shelter Outcome Prediction (edX Choose Your Own)
This project was developed as part of the final Capstone for the [HarvardX Data Science Professional Certificate](https://online-learning.harvard.edu/series/data-science).
It explores the prediction of shelter animal outcomes (e.g. Adoption, Transfer, Euthanasia)
at the **Austin Animal Center**, based on intake-related features.
---
## 🎯 Goal
The objective is to predict shelter animal outcomes based on intake data such as
animal type, condition, and intake type.
Model performance is evaluated using **accuracy**, **confusion matrix**, and **feature importance**.
All steps follow the [edX Honor Code](https://learning.edx.org/honor).
---
## 📚 Data Source
The dataset is provided by the [Austin Animal Center](https://www.kaggle.com/datasets/aaronschlegel/austin-animal-center-shelter-intakes-and-outcomes),
hosted on [Kaggle](https://www.kaggle.com/). It contains detailed records of animal intakes and outcomes.
⚠️ **Note:** The dataset is **not included** in this repository.
Please download it manually from Kaggle and use it locally for testing and report generation.
👉 https://www.kaggle.com/datasets/aaronschlegel/austin-animal-center-shelter-intakes-and-outcomes
---
## 🗂️ Project Structure
| File | Description |
|----------------------------------|---------------------------------------------------------------|
| `01_load_data.R` | Load and inspect the shelter dataset |
| `02_explore_data.R` | Exploratory data analysis (EDA): distributions, NA overview |
| `03_model_baseline.R` | Baseline model: predict most frequent outcome ("Adoption") |
| `04_model_randomforest.R` | Random Forest model with 5-fold CV and feature importance |
| `05_compare_models.R` | Comparison of baseline vs. Random Forest (accuracy & plots) |
| `06_final_model.R` | Final model application without CV, final evaluation |
| `07_final_pipeline.R` | Complete pipeline with all steps and explanatory comments |
| `chooseyourproject_report.Rmd` | Final R Markdown report (edX-compliant) |
| `chooseyourproject_report.pdf` | Rendered PDF version for submission |
| `chooseyourproject_report.html` | Rendered HTML version |
| `LICENSE` | MIT License for reuse |
| `.gitignore` | Excludes data files and system folders |
| `README.md` | This project overview |
---
## 🔎 Final Results
| Metric | Value |
|---------------------------|-----------|
| Baseline Accuracy | 42.18 % |
| Random Forest Accuracy | 58.09 % |
| Absolute Improvement | +15.91 pp |
| Relative Improvement | +37.7 % |
| Final Model Trees | 500 |
Random Forest classifier clearly outperformed the naive baseline.
Most important predictors: `intake_type`, `sex_upon_intake`, `intake_condition`.
---
## 💻 Requirements
- **R 4.x or newer**
- **RStudio**
- Required R packages:
- `tidyverse`
- `caret`
- `randomForest`
- `scales`
- `tidytext` *(for variable importance visualization)*
---
## 📄 Report Access
The final report submitted for edX is available in two formats:
- [chooseyourproject_report.pdf](chooseyourproject_report.pdf)
- [chooseyourproject_report.html](chooseyourproject_report.html)
It includes all modeling steps, evaluations, plots, and interpretations.
---
## 👩💻 Author and License
This project was created by **Yvonne Kirschler**
and is licensed under the [MIT License](LICENSE).
If you reuse code from this repository, please provide proper attribution.
GitHub profile: [@alunera-data](https://github.com/alunera-data)
LinkedIn: [Yvonne Kirschler](https://www.linkedin.com/in/yvonne-kirschler-719224188/)
> _This project was developed independently.
> ChatGPT (OpenAI) was used to support structure, planning and phrasing.
> All modeling, evaluation and reporting were performed and reviewed by the author._