Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/soroush-04/incrementalsvm-road-accident-prediction
Enhance SVM and incremental SVM machine learning models for road accident severity prediction
https://github.com/soroush-04/incrementalsvm-road-accident-prediction
incremental-learning machine-learning python scikit-learn svm
Last synced: about 2 months ago
JSON representation
Enhance SVM and incremental SVM machine learning models for road accident severity prediction
- Host: GitHub
- URL: https://github.com/soroush-04/incrementalsvm-road-accident-prediction
- Owner: soroush-04
- License: mit
- Created: 2023-07-20T04:35:41.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-12-30T05:02:56.000Z (about 1 year ago)
- Last Synced: 2023-12-30T06:19:50.386Z (about 1 year ago)
- Topics: incremental-learning, machine-learning, python, scikit-learn, svm
- Language: Jupyter Notebook
- Homepage:
- Size: 1.84 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Road Accident Severity Prediction
---Table of contents
=======- [Problem Statement ](#problem-statement-1)
- [Data Description](#datadescription)
- [Data Preprocessing](#preprocessing)
- [Improved SVM Model](#improved)
- [Visualization Folium Map](#visualization)
- [Incremental SVM Model](#incremental)---
## Problem Statement
The project aims to address road accidents problem through the use of Machine Learning models, particularly an enhanced Support Vector Machine (SVM) model. It also introduces an incremental SVM model. In traditional SVM models scaling is done during the pre processing step however in our model scaling is done after splitting the dataset into training and testing datasets. During scaling, first the input training dataset is fit and transformed using the Standard Scaler object and then the same object is used to transform the input testing dataset.The Original Dataset contained 47 attributes with 1.5 million accident records. The Dataset included attributes such as Severity which depicts how severe an accident was where 1 indicates the least severe and 4 the most severe. The Dataset also included spatiotemporal attributes; However, the final dataset for the SVM model, which was created after conducting feature extraction, included the 12 most significant factors shown in table below.
| Attribute | Description |
| :------------ | :------------ |
| ID | The unique identifier of an accident |
| differ HHMM | This represents the difference between the start time and the end time of the accident. |
| S time | This Shows the accident’s start time in local time. |
| Month | This indicates the month in which the accident occurred. |
| Start Lat | The latitude in GPS coordinates of the starting position of an accident. |
| Start Lng | The longitude in GPS coordinates of the start- ing position of an accident. |
| Distance.mi. | The extent to which the accident has impacted the road. |
| Accident.Description | The accident is described in basic language. |
| Traffic.Speed | This shows the speed of the vehicle |
| Street | This depicts the street name. |
| City | This shows the city name |
| Weather Condition | This depicts the different weather conditions such as rain, snow, thunderstorm, fog, etc |- Analysing the dataset
- Handling Null Values
- Handling Categorical Variables: We have used LableEncoder() class from sklearn preprocessing library which encodes the values between 0 to n-1.
- Feature Selection : ExtraTressClassifier from sklearn.essemble has been used which randomises specific decisions, and subsets of data to avoid over learning and over fitting.
Utilized scikit-learn to separate the dataset 70/30 into training and testing. Further, the X train and X test data is scaled using StandardScaler() from sklearn.preprocessing. Standardizing is an important step especially when dealing with spatiotemporal data as some variables may dominate over others. It re-scales the distribution of values so that the mean of observed values is 0 and the standard deviation is 1. In this model scaling is implemented after the data has been split into training and testing dataset. To perform scaling,first the X train data is fitted and transformed using StandardScaler() object and then using the same object the X test data is transformed.
The incremental model presented learns from the previously proposed improved support vector machine model to predict the severity. At first, the data is read in a new CSV file and a copy of new CSV is also created to retain the original values of latitude and longitude. Next the data is pre processed and scaled using the same StandardScaler object that was previously used in the improved SVM model. Further, the prediction of the traffic accident severity is done using the improved SVM classifier and the predictions are stored in a new variable.
Based on the confusion matrix, the incremental SVM model has almost accuracy up to 70%, while the enhanced SVM model has high accuracy of 94.8 percent.