https://github.com/rooshmica/concrete-compressive-strength-prediction
Concrete Dataset for Regression Model training
https://github.com/rooshmica/concrete-compressive-strength-prediction
linear-regression machine-learning polynomial-regression python rmse-score scikit-learn
Last synced: 3 days ago
JSON representation
Concrete Dataset for Regression Model training
- Host: GitHub
- URL: https://github.com/rooshmica/concrete-compressive-strength-prediction
- Owner: rooshmica
- Created: 2024-12-19T00:48:23.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-12-19T01:35:45.000Z (6 months ago)
- Last Synced: 2025-02-11T21:14:08.252Z (5 months ago)
- Topics: linear-regression, machine-learning, polynomial-regression, python, rmse-score, scikit-learn
- Language: Jupyter Notebook
- Homepage:
- Size: 3.07 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
#### Concrete-Compressive-Strength-Prediction
Concrete DateSet Path -> https://archive.ics.uci.edu/dataset/165/concrete+compressive+strength
### Project Overview
This project aims to predict the compressive strength of concrete using machine learning models. The dataset contains 1,030 observations with 9 attributes, where 8 are quantitative input variables (cement, water, fine aggregate, etc.) and 1 is the target variable (concrete compressive strength). The goal is to identify key factors that influence concrete strength and build accurate prediction models.### Key Features
Attributes: The dataset includes cement, water, coarse aggregate, fine aggregate, age, fly ash, and other factors.
Target: Concrete compressive strength.Machine Learning Models:
Multiple Linear Regression
Polynomial Regression### Objective
The objective of the project is to:
Analyze the relationships between features and concrete compressive strength.
Build multiple machine learning models to predict the compressive strength.
Compare model performance and select the best model.### Approach
Data Understanding: The dataset is first explored using correlation matrices and visualizations (e.g., scatter matrix).
Data Preparation: The dataset is split into training and testing sets for model building.
Modeling: Different machine learning models are applied and compared for their performance.
Evaluation: Model performance is evaluated using metrics such as R² and RMSE.### Data Information
The dataset contains 1030 samples| idx | column | non-null count | dtype |
|-----|-------------------------------|----------------|---------|
| 0 | Cement | 1030 non-null | float64 |
| 1 | Blast Furnace Slag | 1030 non-null | float64 |
| 2 | Fly Ash | 1030 non-null | float64 |
| 3 | Water | 1030 non-null | float64 |
| 4 | Superplasticizer | 1030 non-null | float64 |
| 5 | Coarse Aggregate | 1030 non-null | float64 |
| 6 | Fine Aggregate | 1030 non-null | float64 |
| 7 | Age | 1030 non-null | int64 |
| 8 | Concrete Compressive Strength | 1030 non-null | float64 |
### Checking for missing values

In some components such as *Blast Furnace Slag*, *Fly Ash*, *Superplasticizer*, the min value (minimum value in that column) contains a value of 0.
Usually, the value 0 will be considered as *missing values*, however in this project **the value 0 will be assumed that the component was not used in the mixing process**### Data Preprocessing Steps
Loaded the dataset and handled any inconsistencies (no missing values).
Performed exploratory data analysis (EDA) to understand feature relationships.
Split the data into training and testing sets.## Correlation Matrix
The following correlation matrix illustrates the relationships between different features in the dataset:
## Scatter Matrix
Here is the scatter matrix showing the pairwise relationships between the features:
### Models Used
Linear Regression: A basic model for capturing linear relationships.
Polynomial Regression: Enhances the model's ability to handle non-linear relationships.### The models are evaluated based on:
R² (Coefficient of Determination): Measures how well the model explains the variance in the data.
RMSE (Root Mean Squared Error): Measures the average error in the model's predictions.
Key Findings:
Polynomial regression significantly improved model performance compared to linear regression (from 54% to 77% accuracy).The evaluation metric used is the **root_mean_squared_error (RMSE)** loss function, implemented using the mean_squared_error loss function from sklearn and then taking the square root using numpy.sqrt(). The result is the RMSE loss function.
RMSE or Root Mean Squared Error is a loss function obtained from the process of squaring the error (y_true - y_prediction) and dividing by the count, then taking the square root.
### Dependencies
pandas
numpy
matplotlib
seaborn
scikit-learn### Conclusion
In this project, we focused on predicting the compressive strength of concrete using machine learning techniques, specifically **Linear Regression** and **Polynomial Regression**. After thoroughly exploring the dataset and preprocessing the data, we built both models to assess their predictive capabilities.
**Key Findings:**
- **Linear Regression**: The linear regression model served as a baseline model. It provided a reasonable initial prediction but had limitations in capturing non-linear relationships within the data, resulting in a lower accuracy.
- **Polynomial Regression**: To improve the model’s ability to capture non-linear trends, we applied polynomial regression. This enhanced the model's performance significantly, increasing the accuracy from **54%** with linear regression to **77%**. The polynomial model was able to better capture the complexity of the relationship between the features and the target variable, thus reducing the model error and improving predictions.### Overall Impact:
- Polynomial regression yielded a much better predictive performance than linear regression, demonstrating that some degree of non-linearity is present in the relationship between the input features and concrete compressive strength.
- The findings from this study can help in making more accurate predictions for concrete strength, which is crucial for civil engineering applications like construction and infrastructure development.### Future Work:
- Further improvements could be made by experimenting with other machine learning models like Decision Trees or Random Forest, which might perform even better for this type of dataset.
- A deeper exploration into feature engineering and hyperparameter tuning could also be beneficial for optimizing model performance.This study successfully demonstrated that a simple polynomial regression model can significantly improve predictive accuracy when dealing with complex relationships in concrete strength prediction.