https://github.com/elymsyr/machine-failure-prediction
This project involves creating a machine learning model to predict machine failure based on various features. The project is divided into several parts, including data visualization, model selection, model training, and creating a Tkinter-based application for real-time predictions.
https://github.com/elymsyr/machine-failure-prediction
Last synced: 3 months ago
JSON representation
This project involves creating a machine learning model to predict machine failure based on various features. The project is divided into several parts, including data visualization, model selection, model training, and creating a Tkinter-based application for real-time predictions.
- Host: GitHub
- URL: https://github.com/elymsyr/machine-failure-prediction
- Owner: elymsyr
- License: mit
- Created: 2024-07-30T15:02:42.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-07-30T19:54:36.000Z (10 months ago)
- Last Synced: 2025-01-01T23:10:04.369Z (5 months ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 618 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Machine Failure Prediction
This project involves creating a machine learning model to predict machine failure based on various features. The project is divided into several parts, including data visualization, model selection, model training, and creating a Tkinter-based application for real-time predictions.
## Project Overview
1. **Data Visualization and Feature Selection**: Analyzes the dataset to visualize relationships and select the most relevant features.
2. [**Model Selection with GridSearchCV**](https://www.kaggle.com/code/muhammadfaizan65/machine-failure-prediction-eda-modeling): Determines the best model and hyperparameters using GridSearchCV, from [Kaggle](https://www.kaggle.com/code/muhammadfaizan65/machine-failure-prediction-eda-modeling).
3. **Model Training**: Trains a Gradient Boosting model with the selected features and parameters.
4. **Tkinter Application**: Provides a graphical user interface (GUI) for real-time feature adjustments and predictions.## Requirements
- Python 3.10.14
- Libraries: [requirements.txt](requirements.txt)Install the required libraries using pip:
```bash
pip install -r requirements.txt
```## [Data](https://www.kaggle.com/datasets/umerrtx/machine-failure-prediction-using-sensor-data)
### Overview
This dataset contains sensor data collected from various machines, intending to predict machine failures in advance. It includes a variety of sensor readings as well as recorded machine failures. Dataset has 944 row (1 repeated row.) and has no NaN values, from [Kaggle](https://www.kaggle.com/datasets/umerrtx/machine-failure-prediction-using-sensor-data)### Features
- footfall: The number of people or objects passing by the machine.
- tempMode: The temperature mode or setting of the machine.
- AQ: Air quality index near the machine.
- USS: Ultrasonic sensor data, indicating proximity measurements.
- CS: Current sensor readings, indicating the electrical current usage of the machine.
- VOC: The level of volatile organic compounds detected near the machine.
- RP: The machine parts' rotational position or RPM (revolutions per minute).
- IP: Input pressure to the machine.
- Temperature: The operating temperature of the machine.
- fail: Binary indicator of machine failure (1 for failure, 0 for no failure).#### Target Distribution
## GUI
## Results
### Results with [Data](Data\data.csv)
- Features: `footfall` `tempMode` `AQ` `USS` `CS` `VOC` `RP` `IP` `Temperature` `fail`
- Test Size: 0.2
- Data Size: 944
- Results:
```
Fitting 5 folds for each of 16 candidates, totalling 80 fits
Best parameters for RandomForest: {'max_depth': None, 'n_estimators': 10}
Accuracy for RandomForest: 0.8677248677248677
Fitting 5 folds for each of 20 candidates, totalling 100 fits
Best parameters for GradientBoosting: {'learning_rate': 0.1, 'n_estimators': 50}
Accuracy for GradientBoosting: 0.8888888888888888
Fitting 5 folds for each of 8 candidates, totalling 40 fits
Best parameters for LogisticRegression: {'C': 0.01}
Accuracy for LogisticRegression: 0.8783068783068783
Fitting 5 folds for each of 16 candidates, totalling 80 fits
Best parameters for SVM: {'C': 0.05, 'kernel': 'linear'}
Accuracy for SVM: 0.873015873015873Best model: GradientBoosting with accuracy: 0.8888888888888888
```#### Feature Distribution
#### Correlation Matrix
### Results with [Feature Reduced Data](Data\data_cleaned.csv)
- Features: `AQ` `USS` `VOC` ` RP` `IP` `Temperature` `fail`
- Test Size: 0.2
- Data Size: 944
- Results:
```
Fitting 5 folds for each of 16 candidates, totalling 80 fits
Best parameters for RandomForest: {'max_depth': None, 'n_estimators': 50}
Accuracy for RandomForest: 0.873015873015873
Fitting 5 folds for each of 20 candidates, totalling 100 fits
Best parameters for GradientBoosting: {'learning_rate': 0.05, 'n_estimators': 100}
Accuracy for GradientBoosting: 0.8888888888888888
Fitting 5 folds for each of 8 candidates, totalling 40 fits
Best parameters for LogisticRegression: {'C': 0.01}
Accuracy for LogisticRegression: 0.8783068783068783
Fitting 5 folds for each of 16 candidates, totalling 80 fits
Best parameters for SVM: {'C': 0.05, 'kernel': 'linear'}
Accuracy for SVM: 0.873015873015873Best model: GradientBoosting with accuracy: 0.8888888888888888
```
#### Feature Distribution
#### Correlation Matrix