Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hariprasath-v/intel_oneapi_hackerearth_predict-the-quality-of-freshwater
Build a machine model to predict whether the freshwater is safe to drink or not.Based on the measures like pH, TDS, etc.
https://github.com/hariprasath-v/intel_oneapi_hackerearth_predict-the-quality-of-freshwater
catboost classification exploratory-data-analysis f1score lightgbm modin onedal pandas python3 shapash xgboost
Last synced: 5 days ago
JSON representation
Build a machine model to predict whether the freshwater is safe to drink or not.Based on the measures like pH, TDS, etc.
- Host: GitHub
- URL: https://github.com/hariprasath-v/intel_oneapi_hackerearth_predict-the-quality-of-freshwater
- Owner: hariprasath-v
- License: apache-2.0
- Created: 2023-03-04T06:32:36.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-11-28T10:57:35.000Z (about 1 year ago)
- Last Synced: 2024-11-13T15:54:30.056Z (2 months ago)
- Topics: catboost, classification, exploratory-data-analysis, f1score, lightgbm, modin, onedal, pandas, python3, shapash, xgboost
- Language: HTML
- Homepage:
- Size: 5.83 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Intel_oneAPI_Hackerearth_Predict-the-quality-of-freshwater
### Competition hosted on Hackerearth
### Problem
Freshwater is one of our most vital and scarce natural resources, making up just 3% of the earth’s total water volume. It touches nearly every aspect of our daily lives, from drinking, swimming, and bathing to generating food, electricity, and the products we use every day. Access to a safe and sanitary water supply is essential not only to human life, but also to the survival of surrounding ecosystems that are experiencing the effects of droughts, pollution, and rising temperatures.
### Expected Solution:
In this track of the hackathon, you will have the opportunity to apply the oneAPI skills to help global water security and environmental sustainability efforts by predicting whether freshwater is safe to drink and use for the ecosystems that rely on it.
### Dataset
You can download the dataset here
### Mandate
Usage of Intel® oneAPI AI Analtyics toolkits is mandatory to participate.
### Solution:
### Exploratory Data Analysis
#### The basic exploratory data analysis of the data,
* Missing value analysis
* Numerical column distribtution analysis
* Interaction between categorical and numerical columns
#### The above analysis had done by using,
* modin(intel) pandas
* numpy
* seaborn
* matplotlib
* missingno
* klib
### Data pre-processing
The missing values are imputed by two method.
#### Mean value imputated for following columns,
* Odor
* Total Dissolved Solids
#### Median value imputated for following columns,
* pH
* Iron
* Nitrate
* Chloride
* Lead
* Zinc
* Turbidity
* Fluoride
* Copper
* Odor
* Sulfate
* Conductivity
* Water Temperature
* Air Temperature#### Color and Source features don't have any interaction with other numerical measures.
#### Month, Day, Time of day these features don't have any relevant information for determining the quality of freshwater.### Model
#### Created stratified train and test dataset from the entire dataset.
#### Xgboost, lightgbm, catboost models are trained and evaluated with F1 score.
#### For faster inference models are converted to oneDAL model.
### For more information about the models.
#### F1 score comparison
![Alt text](https://github.com/hariprasath-v/Intel_oneAPI_Hackerearth_Predict-the-quality-of-freshwater/blob/main/Model%20Interpretation/F1%20Score%20Comparison.png)
![Alt text](https://github.com/hariprasath-v/Intel_oneAPI_Hackerearth_Predict-the-quality-of-freshwater/blob/main/Model%20Interpretation/F1%20score%20comparison%20dataframe.PNG)#### Model explained with shapash library
#### Lightgbm model Feature Importances
![Alt text](https://github.com/hariprasath-v/Intel_oneAPI_Hackerearth_Predict-the-quality-of-freshwater/blob/main/Model%20Interpretation/lightgbm%20model%20shapash%20feature%20importances.png)#### Lightgbm model Local explantion for class 1(safe to drink)
![Alt text](https://github.com/hariprasath-v/Intel_oneAPI_Hackerearth_Predict-the-quality-of-freshwater/blob/main/Model%20Interpretation/lightgbm%20model%20Local%20explantion%20for%20class%201(safe%20to%20drink).png)#### Catboost model Feature Importances
![Alt text](https://github.com/hariprasath-v/Intel_oneAPI_Hackerearth_Predict-the-quality-of-freshwater/blob/main/Model%20Interpretation/catboost%20model%20shapash%20feature%20importances.png)#### Catboost model Local explantion for class 1(safe to drink)
![Alt text](https://github.com/hariprasath-v/Intel_oneAPI_Hackerearth_Predict-the-quality-of-freshwater/blob/main/Model%20Interpretation/catboost%20model%20Local%20explantion%20for%20class%201(safe%20to%20drink).png)### Model demo
#### For model demo catboost model is used.The demo app created by using the streamlit.
#### Manual input fields for the columns
#### Local explanation using shapash smart predictor
Model Demo App
### File information
predict-the-quality-of-freshwater-model.ipynb[![Open in Kaggle](https://img.shields.io/static/v1?label=&message=Open%20in%20Kaggle&labelColor=grey&color=blue&logo=kaggle)](https://www.kaggle.com/code/hari141v/predict-the-quality-of-freshwater-model)
predict-the-quality-of-freshwater-eda.ipynb[![Open in Kaggle](https://img.shields.io/static/v1?label=&message=Open%20in%20Kaggle&labelColor=grey&color=blue&logo=kaggle)](https://www.kaggle.com/code/hari141v/predict-the-quality-of-freshwater-eda)