Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/franckalbinet/lssm
Modelling pipeline to develop and monitor Large Soil Spectral Models (LSSM)
https://github.com/franckalbinet/lssm
Last synced: 3 months ago
JSON representation
Modelling pipeline to develop and monitor Large Soil Spectral Models (LSSM)
- Host: GitHub
- URL: https://github.com/franckalbinet/lssm
- Owner: franckalbinet
- License: apache-2.0
- Created: 2023-09-17T19:22:32.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-07-03T21:06:05.000Z (8 months ago)
- Last Synced: 2024-11-08T23:52:30.560Z (3 months ago)
- Language: Jupyter Notebook
- Size: 94.5 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Large Soil Spectral Models (LSSM)
This is a Python package allowing to reproduce the research work done by
[Franck Albinet](https://www.linkedin.com/in/franckalbinet) in the
context of a PhD @ [KU Leuven](https://www.kuleuven.be/) titled
**“Multiscale Characterization of Exchangeable Potassium Content in Soil
to Remediate Agricultural Land Affected by Radioactive Contamination
using Machine Learning, Soil Spectroscopy and Remote Sensing”**.**Our first paper** [Albinet, F., Peng, Y., Eguchi, T., Smolders, E.,
Dercon, G., 2022. Prediction of exchangeable potassium in soil through
mid-infrared spectroscopy and deep learning: From prediction to
explainability. Artificial Intelligence in Agriculture 6,
230–241.](https://www.sciencedirect.com/science/article/pii/S2589721722000186)
investigated the possibility to predict exchangeable potassium in soil
using large Mid-infrared soil spectral libraries and Deep Learning. Code
available [here](https://github.com/franckalbinet/mirzai).We are now **exploring the potential to characterize and predict
exchangeable potassium using both Near- and Mid-infrared soil
spectroscopy, with a focus on leveraging advanced Deep Learning models
such as ResNet and ViT transformers through transfer learning**.*Our Deep Learning pipeline is primarily based on the approach described
by [Jeremy Howard](https://github.com/fastai/course22p2)*.## Install
``` sh
pip install lssm
```## Getting started
We demonstrate a typical workflow below to showcase our method.
``` python
from pathlib import Path
from functools import partialfrom sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_splitfrom torch import optim, nn
import timm
from torcheval.metrics import R2Score
from torch.optim import lr_scheduler
from lssm.loading import load_ossl
from lssm.learner import Learner
from lssm.preprocessing import ToAbsorbance, ContinuumRemoval, Log1p
from lssm.dataloaders import SpectralDataset, get_dls
from lssm.callbacks import (MetricsCB, BatchSchedCB, BatchTransformCB,
DeviceCB, TrainCB, ProgressCB)
from lssm.transforms import GADFTfm, _resizeTfm, StatsTfm
```### Loading training & validation data
1. Load model from `timm` python package, Deep Learning
State-Of-The-Art (SOTA) pre-trained models:``` python
model_name = 'resnet18'
model = timm.create_model(model_name, pretrained=True, in_chans=1, num_classes=1)
```2. Automatically download large spectral libraries developed by our
colleagues at [WCRC](https://www.woodwellclimate.org). We focus on
exchangeable potassium in the example below:``` python
analytes = 'k.ext_usda.a725_cmolc.kg'
data = load_ossl(analytes, spectra_type='visnir')
X, y, X_names, smp_idx, ds_name, ds_label = data
```Reading & selecting data ...
3. A bit of data features and target preprocessing:
``` python
X = Pipeline([('to_abs', ToAbsorbance()),
('cr', ContinuumRemoval(X_names))]).fit_transform(X)y = Log1p().fit_transform(y)
```100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 44489/44489 [00:15<00:00, 2850.84it/s]
4. Typical train/test split to get a train and valid dataset:
``` python
n_smp = 5000 # For demo. purpose (in reality we have > 50K)
X_train, X_valid, y_train, y_valid = train_test_split(X[:n_smp, :], y[:n_smp],
test_size=0.1,
stratify=ds_name[:n_smp],
random_state=41)
```5. Finally, creating a custom PyTorch `DataLoader`:
``` python
train_ds, valid_ds = [SpectralDataset(X, y, )
for X, y, in [(X_train, y_train), (X_valid, y_valid)]]# Then PyTorch dataloaders
dls = get_dls(train_ds, valid_ds, bs=32)
```### Training
``` python
epochs = 1
lr = 5e-3# We use `r2` along to assess performance
metrics = MetricsCB(r2=R2Score())# We use Once Cycle Learning Rate scheduling approach
tmax = epochs * len(dls.train)
sched = partial(lr_scheduler.OneCycleLR, max_lr=lr, total_steps=tmax)# A series of preprocessing performed on GPUs
# - put to GPU
# - transform to 1D to 2D spectra using Gramian Angular Difference Field (GADF)
# - resize the 2D version
# - apply pre-trained model stats
xtra = [BatchSchedCB(sched)]
gadf = BatchTransformCB(GADFTfm())
resize = BatchTransformCB(_resizeTfm)
stats = BatchTransformCB(StatsTfm(model.default_cfg))cbs = [DeviceCB(), gadf, resize, stats, TrainCB(),
metrics, ProgressCB(plot=False)]learn = Learner(model, dls, nn.MSELoss(), lr=lr,
cbs=cbs+xtra, opt_func=optim.AdamW)learn.fit(epochs)
```/* Turns off some styling */
progress {
/* gets rid of default border in Firefox and Opera. */
border: none;
/* Needs to be in here for Safari polyfill so background images work as expected. */
background-size: auto;
}
progress:not([value]), progress:not([value])::-webkit-progress-bar {
background: repeating-linear-gradient(45deg, #7e7e7e, #7e7e7e 10px, #5c5c5c 10px, #5c5c5c 20px);
}
.progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {
background: #F44336;
}
0.00% [0/1 00:00<?]
4.39% [55/1252 00:23<08:42 0.084]