Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/hariprasath-v/intel_oneapi_hackerearth_predict-the-quality-of-freshwater

Build a machine model to predict whether the freshwater is safe to drink or not.Based on the measures like pH, TDS, etc.
https://github.com/hariprasath-v/intel_oneapi_hackerearth_predict-the-quality-of-freshwater

catboost classification exploratory-data-analysis f1score lightgbm modin onedal pandas python3 shapash xgboost

Last synced: 5 days ago
JSON representation

Build a machine model to predict whether the freshwater is safe to drink or not.Based on the measures like pH, TDS, etc.

Awesome Lists containing this project

README

        

# Intel_oneAPI_Hackerearth_Predict-the-quality-of-freshwater

### Competition hosted on Hackerearth

### Problem

Freshwater is one of our most vital and scarce natural resources, making up just 3% of the earth’s total water volume. It touches nearly every aspect of our daily lives, from drinking, swimming, and bathing to generating food, electricity, and the products we use every day. Access to a safe and sanitary water supply is essential not only to human life, but also to the survival of surrounding ecosystems that are experiencing the effects of droughts, pollution, and rising temperatures.

### Expected Solution:

In this track of the hackathon, you will have the opportunity to apply the oneAPI skills to help global water security and environmental sustainability efforts by predicting whether freshwater is safe to drink and use for the ecosystems that rely on it.

### Dataset

You can download the dataset here

### Mandate

Usage of Intel® oneAPI AI Analtyics toolkits is mandatory to participate.

### Solution:

### Exploratory Data Analysis
#### The basic exploratory data analysis of the data,
* Missing value analysis
* Numerical column distribtution analysis
* Interaction between categorical and numerical columns
#### The above analysis had done by using,
* modin(intel) pandas
* numpy
* seaborn
* matplotlib
* missingno
* klib

### Data pre-processing
The missing values are imputed by two method.
#### Mean value imputated for following columns,
* Odor
* Total Dissolved Solids
#### Median value imputated for following columns,
* pH
* Iron
* Nitrate
* Chloride
* Lead
* Zinc
* Turbidity
* Fluoride
* Copper
* Odor
* Sulfate
* Conductivity
* Water Temperature
* Air Temperature

#### Color and Source features don't have any interaction with other numerical measures.
#### Month, Day, Time of day these features don't have any relevant information for determining the quality of freshwater.

### Model
#### Created stratified train and test dataset from the entire dataset.
#### Xgboost, lightgbm, catboost models are trained and evaluated with F1 score.
#### For faster inference models are converted to oneDAL model.
### For more information about the models.
#### F1 score comparison
![Alt text](https://github.com/hariprasath-v/Intel_oneAPI_Hackerearth_Predict-the-quality-of-freshwater/blob/main/Model%20Interpretation/F1%20Score%20Comparison.png)
![Alt text](https://github.com/hariprasath-v/Intel_oneAPI_Hackerearth_Predict-the-quality-of-freshwater/blob/main/Model%20Interpretation/F1%20score%20comparison%20dataframe.PNG)

#### Model explained with shapash library

#### Lightgbm model Feature Importances
![Alt text](https://github.com/hariprasath-v/Intel_oneAPI_Hackerearth_Predict-the-quality-of-freshwater/blob/main/Model%20Interpretation/lightgbm%20model%20shapash%20feature%20importances.png)

#### Lightgbm model Local explantion for class 1(safe to drink)
![Alt text](https://github.com/hariprasath-v/Intel_oneAPI_Hackerearth_Predict-the-quality-of-freshwater/blob/main/Model%20Interpretation/lightgbm%20model%20Local%20explantion%20for%20class%201(safe%20to%20drink).png)

#### Catboost model Feature Importances
![Alt text](https://github.com/hariprasath-v/Intel_oneAPI_Hackerearth_Predict-the-quality-of-freshwater/blob/main/Model%20Interpretation/catboost%20model%20shapash%20feature%20importances.png)

#### Catboost model Local explantion for class 1(safe to drink)
![Alt text](https://github.com/hariprasath-v/Intel_oneAPI_Hackerearth_Predict-the-quality-of-freshwater/blob/main/Model%20Interpretation/catboost%20model%20Local%20explantion%20for%20class%201(safe%20to%20drink).png)

### Model demo
#### For model demo catboost model is used.The demo app created by using the streamlit.
#### Manual input fields for the columns
#### Local explanation using shapash smart predictor
Model Demo App

### File information

predict-the-quality-of-freshwater-model.ipynb[![Open in Kaggle](https://img.shields.io/static/v1?label=&message=Open%20in%20Kaggle&labelColor=grey&color=blue&logo=kaggle)](https://www.kaggle.com/code/hari141v/predict-the-quality-of-freshwater-model)

predict-the-quality-of-freshwater-eda.ipynb[![Open in Kaggle](https://img.shields.io/static/v1?label=&message=Open%20in%20Kaggle&labelColor=grey&color=blue&logo=kaggle)](https://www.kaggle.com/code/hari141v/predict-the-quality-of-freshwater-eda)