{"id":20711644,"url":"https://github.com/sralter/classifire","last_synced_at":"2026-05-02T08:31:44.053Z","repository":{"id":149993428,"uuid":"608934599","full_name":"sralter/classifire","owner":"sralter","description":"Wildfire Prediction Model: Samuel Alter's BrainStation 2023 Data Science Capstone Project","archived":false,"fork":false,"pushed_at":"2024-09-24T20:32:51.000Z","size":48913,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-11T06:27:03.318Z","etag":null,"topics":["qgis","scikit-learn","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sralter.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-03T02:46:35.000Z","updated_at":"2024-09-26T16:03:08.000Z","dependencies_parsed_at":null,"dependency_job_id":"f9923cba-e5fb-42c6-a080-8d3f7259f07d","html_url":"https://github.com/sralter/classifire","commit_stats":null,"previous_names":["sralter/classifire"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/sralter/classifire","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sralter%2Fclassifire","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sralter%2Fclassifire/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sralter%2Fclassifire/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sralter%2Fclassifire/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sralter","download_url":"https://codeload.github.com/sralter/classifire/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sralter%2Fclassifire/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32528167,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-02T01:12:54.858Z","status":"online","status_checked_at":"2026-05-02T02:00:05.923Z","response_time":132,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["qgis","scikit-learn","tensorflow"],"created_at":"2024-11-17T02:16:25.442Z","updated_at":"2026-05-02T08:31:44.022Z","avatar_url":"https://github.com/sralter.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Classi**FIRE**\n_Samuel Alter's BrainStation 2023 Data Science Capstone Project, Spring 2023_\n\n## Motivation\n- Wildfires, or the uncontrolled burning of vegetation, cause enormous damage and loss of life worldwide every year.[^1]\n- **33 people died** and **\u003e4.3 million acres burned** in California in 2020 alone. This is **equivalent to the total combined area of Puerto Rico and Rhode Island**.\n- Although they are a natural component of some forest ecosystems, **wildfire incidents are projected to increase** in our warming world.[^2]\n- While society transitions to a greener future, **there is a clear need to predict where wildfires are likely to occur** so that communities can protect themselves and evacuate when necessary.\n\n## Introduction\n- For this project, I collected a combination of **satellite imagery**, **topographic data** (aspect, elevation, and slope), and **historical fire boundaries** to train models that would predict wildfire risk (**Figure 1**). \n- I first used [QGIS](https://qgis.org/en/site/), an open-source mapping application, to setup the relationships between the datasets.\n- Then used **Python** to process and **model the data**.\n- I fed the processed data into machine learning and deep learning models to make my predictions.\n- I chose to focus on the **Santa Monica Mountains in Ventura and Los Angeles Counties**, for its relatively manageable **size**, **proximity to populated areas**, and its **east-west trend**, which makes the topographic analysis easier.\n\n\u003cimg width=\"696\" alt=\"Figure 1: Data ingestion pipeline\" src=\"https://github.com/sralter/brainstation_2023_ds_capstone/assets/25013680/03704211-5734-49b8-bb13-8cc88e2a8d65\"\u003e\n\n## Datasets Used\n- **Satellite imagery** and **topographic data** from USGS’ EarthExplorer[^3]\n- Historical records of **wildfire burn boundaries** from the National Interagency Fire Center[^4]\n- File formats used: .tif and .jpg (imagery); .geojson and .csv (historical wildfire data).\n\n## Mapping, Data Cleaning, and Preprocessing\n- In order to train a model that integrates spatial information with historical fire boundaries, I had to find a way to quantify the continuous spatial information. I settled on **building an evenly-spaced points layer** that would have the topographic and fire boundary data from under that point appended to the file (**Figure 2**).\n- Since the satellite photos were from 2018, I removed all fire polygons from after 2018 and **merged the rest to one shape**.\n- Using the point layer, **an overlapping function could determine if a part of the landscape experienced a fire (“fire” area) or not (“nofire” area)**. The resultant file would serve as the “topographic” data in the project.\n- For the geographic data, I focused on **aspect** (i.e. what direction the land is facing), **elevation** (meters above sea level), and **slope** (degrees). It is a straightforward operation to extract these data from an elevation raster and append them to the point layer.\n- I **tiled the satellite imagery into 128x128 pixels** that would be fed into a **Tensorflow image model**, and saved them into two folders (i.e., fire and nofire).\n- At the end of the data collection step, I had a **table of geographic data** and **two image folders** (fire areas and non-fire areas).\n\n\u003cimg width=\"735\" alt=\"Figure 2: Mapping flow from QGIS to modeling\" src=\"https://github.com/sralter/brainstation_2023_ds_capstone/assets/25013680/1c198858-da48-47c0-9860-ea417e3471bd\"\u003e\n\n## Modeling, Results, and Insights\n- Since data is in two forms (point-based topographic information and satellite image tiles), I trained two models. After finding an optimal model, I created a metamodel, which used the predictions from the topographic and image analysis models as factors for a new logistic regression (**Figure 3**).\n\n\u003cimg width=\"704\" alt=\"Figure 3: Modeling pipelines using statsmodels, sklearn, Tensorflow (VGG19), and finally a metamodel with sklearn's logistic regression.\" src=\"https://github.com/sralter/brainstation_2023_ds_capstone/assets/25013680/504c806a-4aeb-42bf-b566-9c3253ee4bba\"\u003e\n\n  - _Topographic Data_  \n    - The **areas affected by fire** were predominantly **located in mountainous regions with higher mean elevation** compared to fire-free areas (**Table 1**).\n    - Correlations were observed between different topographic factors, such as categorical aspect with continuous aspect, and categorical slope and elevation with continuous elevation and slope.\n    - `LogisticRegression` **was chosen as the modeling approach** for classifying fire incidence based on topographic data, revealing the relative importance of different features.\n    - Various models were tested, including `naive_bayes`, `Bernoulli` and `Gauss`, `XGBClassifier`, `RandomForestClassifier`, `AdaBoost`, and `GradientBoostingClassifier`, with `GradientBoostingClassifier` **achieving the highest accuracy**.\n    - Grid search was employed to optimize the GradientBoostingClassifier model, but the best accuracy remained the same with specific parameter values.\n\n\u003cimg width=\"696\" alt=\"Table 1: Summary statistics by fire/nofire areas\" src=\"https://github.com/sralter/brainstation_2023_ds_capstone/assets/25013680/918a9941-a27b-4c09-8536-cb4b0ddac3e4\"\u003e\n\n  -  _Satellite Imagery_  \n     -  BigEarthNet[^5] suggested the **pre-trained VGG19 model** to be sufficient, so I used that with **Tensorflow-Keras** and **added a final dense layer with two output nodes** to represent the fire/nofire categories required by this project.\n     -  With **20,025,410 total parameters**, I deemed the model more than sufficient for the project’s needs.\n     -  After training, **the model achieved an accuracy of 95.6%** on classifying all the images.\n     -  **Figure 4** shows typical images in the set as well as the locations of the two areas (fire and non-fire) that I used to feed the models.\n\n\u003cimg width=\"623\" alt=\"Figure 4: examples of fire and nofire satellite images\" src=\"https://github.com/sralter/brainstation_2023_ds_capstone/assets/25013680/ed8122e2-a2be-4545-bee6-56b5a3ba570a\"\u003e\n\u003cimg width=\"1303\" alt=\"Figure 3: Map of the sections within the study area used to feed the models\" src=\"https://github.com/sralter/brainstation_2023_ds_capstone/assets/25013680/d1121556-5d5f-4fbf-a64e-065752f546f7\"\u003e\n\n  -  _Metamodel_\n     -  To construct the metamodel, I **extracted the predictions from the topographic and imagery datasets, used them as features**, and ran the two through a **scikit-learn `LogisticRegression`, which achieved over 99% accuracy**. This concluded the modeling portion of the project.\n\n## Findings and Conclusions\n- The statsmodels `Logit` model reveals that **higher elevations and west- and north-facing hillslopes correlate with wildfires, potentially due to dominant wind directions in the area**.\n- The project serves as a **proof-of-concept**, demonstrating the capabilities of a machine-learning model using remotely-sensed data for predicting wildfire risk. **Future iterations can expand by incorporating weather and time series analysis** and **testing the model on different landscapes**.\n- Despite limitations, such as lack of high-resolution weather data, **the project shows promise for developing a robust wildfire risk prediction model** with broader data and automated GIS workflows.\n- **Publishing the model in a more accessible manner** could provide communities with **valuable insights into their wildfire risk**.\n\n## Postscript: Commentary on why the accuracy is so high\n* Accuracy is almost 100%, which is suspiciously high\n* Since the model was trained on either 100% burned areas or 100% unburned areas, it only knows the clearly-delineated cases.\n* Furthermore, when tested, the model was only given 100% burned or unburned areas\n* The unburned area was a city center. There would never be a wildfire in a concrete jungle\n* The opposite is similarly always true: in the mountains, away from all civilization, fires are extremely likely and are much harder to combat\n\n## Extra goodies\n- Correlation matrix of the geographic data:  \n\u003cimg width=\"727\" alt=\"Correlation matrix of the geographic data\" src=\"https://github.com/sralter/brainstation_2023_ds_capstone/assets/25013680/d88c52a4-bb47-42c6-a990-91650ba1fa95\"\u003e\n\n- Summary of models and results for TopoData:  \n\u003cimg width=\"1255\" alt=\"Summary of models and results for TopoData\" src=\"https://github.com/sralter/brainstation_2023_ds_capstone/assets/25013680/ee414012-b4b3-4a66-81f0-be9cbf02e35e\"\u003e\n\n- Summary of satellite imagery model:  \n\u003cimg width=\"726\" alt=\"Summary of satellite imagery model\" src=\"https://github.com/sralter/brainstation_2023_ds_capstone/assets/25013680/11557b3f-2952-4ae1-95cc-4ca9d158ed11\"\u003e\n\n- Summary of metamodel:  \n\u003cimg width=\"726\" alt=\"Summary of metamodel\" src=\"https://github.com/sralter/brainstation_2023_ds_capstone/assets/25013680/fb34dc97-acd4-4e0a-a8e7-891363b49f2e\"\u003e\n\n- Satellite image and historic wildfire boundaries (up to 2018):  \n\u003cimg width=\"1319\" alt=\"Satellite image and historic wildfire boundaries (up to 2018)\" src=\"https://github.com/sralter/brainstation_2023_ds_capstone/assets/25013680/3dce0beb-fbaf-4b8a-abad-da2250486b8c\"\u003e\n\n- Satellite image and hillshade of study area:  \n\u003cimg width=\"1320\" alt=\"Satellite image and hillshade of study area\" src=\"https://github.com/sralter/brainstation_2023_ds_capstone/assets/25013680/b6c75336-f339-4709-8f26-a580dd9b6498\"\u003e\n\n[^1]: [https://www.fire.ca.gov/incidents/2020/]\n[^2]: [https://report.ipcc.ch/ar6syr/pdf/IPCC_AR6_SYR_SPM.pdf]\n[^3]: [https://earthexplorer.usgs.gov]\n[^4]: [https://data-nifc.opendata.arcgis.com/]\n[^5]: [https://bigearth.net/]\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsralter%2Fclassifire","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsralter%2Fclassifire","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsralter%2Fclassifire/lists"}