https://github.com/dlt3/odor-data-analysis
Complex odor analysis and interpretation
https://github.com/dlt3/odor-data-analysis
explainable-ai machine-learning partial-dependence-plot
Last synced: 5 months ago
JSON representation
Complex odor analysis and interpretation
- Host: GitHub
- URL: https://github.com/dlt3/odor-data-analysis
- Owner: dlt3
- Created: 2022-09-27T08:30:43.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-03-20T07:13:22.000Z (almost 3 years ago)
- Last Synced: 2025-03-25T02:44:02.504Z (11 months ago)
- Topics: explainable-ai, machine-learning, partial-dependence-plot
- Language: Jupyter Notebook
- Homepage:
- Size: 15.9 MB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Odor data analysis
This study focus on develop a odor predict model and interpret the model's classification result by using explainable AI method.
#### Reference
- https://doi.org/10.3390/app12062826
- https://doi.org/10.3390/app122412943
### Research purpose
- Prevention of odor in pig barns by managing chemical substances (odor substances) that affect odor generation
- Creation of an optimal prediction model for complex odors using 15 odorous substances
- Identification of the influence of odorous substances on complex odors and the interaction effect between odorous substances
- Creation of a complex odor classification prediction model using 15 odorous substances and measurement-related variables
- Prevention of bad smell in pig houses by managing chemical substances (odor substances) that affect odor generation
### Data information
- explanatory variable : Complex odor
- response variable : 15 odorous substances
- Ammonia
- Sulfur compounds: Hydrogen Sulfide, Methyl mercaptan, Dimethyl sulfide, Dimethyl disulfide
- Volatile Organic compounds: Acetic acid , Propionic acid, Butyric acid, Iso-Butyric acid, Valeric acid, Iso-Valeric aic, Phenol, para-Cresol, Indole, Skatole

### Analysis process
#### Research 1
- Compare different analysis processes to find the optimal predictive model
- Data problems and solutions
- High missing rate: Considering the fact that the missing rate may be high considering data collection through sensors in the future, consider the replacement method rather than the missing value removal method
- Small amount of data: Model validation through the Leave-One-Out Cross Validation (LOOCV) method that can be used when there is little data
- Data pre-processing
- Missing imputation: Simple imputation (mean, median), Multivariate imputation (bayesian), Multiple imputation (bayesian ridge, gaussian process regression, KNN)
- Feature preprocessing: standardization, Partial Least Square (PLS), Principal Component Analysis (PCA)
- Prediction models: Regression, SVM, RandomForest, ExtraTree, XGBoost, DNN
- Model Verification: Using R-square, MAPE through LOOCV
- Additional Analysis: Correlation Analysis, Principal Component Analysis(PCA), Identification of predictor feature importance

#### Research 2
- Features related to measurement: measurement time (year, month, day), measurement location (inside the pig barn, outside the pig barn, site boundary)
- summary
- Perform data preprocessing based on primary research and compare multiple machine learning models
- Minimize overfitting by analyzing 30 times and select the optimal model through 8 evaluation indicators
- Identification of the influence and interaction effect of odor spray through the XAI method
- Data pre-processing
- Complex odor: Conversion of continuous data into binary classification data in the form of emission possible / non emission in accordance with the domestic odor prevention law
- Measurement-related variables: Measurement time variables are converted into seasonal variables, followed by One-Hot Encoding, and measurement location variables One-Hot Encoding
- Variable preprocessing: Multivariate imputation (bayesian ridge) & Standardization
- Prediction models: k-Nearest Neighbor, SVC, RandomForest, LightGBM, ExtraTree, XGBoost
- Model validation: F1-score, Accuracy, Sensitivity, Specitiv
- Identification of influence: XAI - Partial Dependence Plot, variable importance
- Additional analysis: correlation analysis and VIF (continuous variable), ANOVA (categorical variable)
