Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/franckalbinet/lssm

Modelling pipeline to develop and monitor Large Soil Spectral Models (LSSM)
https://github.com/franckalbinet/lssm

Last synced: 3 months ago
JSON representation

Modelling pipeline to develop and monitor Large Soil Spectral Models (LSSM)

Awesome Lists containing this project

README

        

# Large Soil Spectral Models (LSSM)

This is a Python package allowing to reproduce the research work done by
[Franck Albinet](https://www.linkedin.com/in/franckalbinet) in the
context of a PhD @ [KU Leuven](https://www.kuleuven.be/) titled
**“Multiscale Characterization of Exchangeable Potassium Content in Soil
to Remediate Agricultural Land Affected by Radioactive Contamination
using Machine Learning, Soil Spectroscopy and Remote Sensing”**.

**Our first paper** [Albinet, F., Peng, Y., Eguchi, T., Smolders, E.,
Dercon, G., 2022. Prediction of exchangeable potassium in soil through
mid-infrared spectroscopy and deep learning: From prediction to
explainability. Artificial Intelligence in Agriculture 6,
230–241.](https://www.sciencedirect.com/science/article/pii/S2589721722000186)
investigated the possibility to predict exchangeable potassium in soil
using large Mid-infrared soil spectral libraries and Deep Learning. Code
available [here](https://github.com/franckalbinet/mirzai).

We are now **exploring the potential to characterize and predict
exchangeable potassium using both Near- and Mid-infrared soil
spectroscopy, with a focus on leveraging advanced Deep Learning models
such as ResNet and ViT transformers through transfer learning**.

*Our Deep Learning pipeline is primarily based on the approach described
by [Jeremy Howard](https://github.com/fastai/course22p2)*.

## Install

``` sh
pip install lssm
```

## Getting started

We demonstrate a typical workflow below to showcase our method.

``` python
from pathlib import Path
from functools import partial

from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split

from torch import optim, nn

import timm

from torcheval.metrics import R2Score
from torch.optim import lr_scheduler
from lssm.loading import load_ossl
from lssm.learner import Learner
from lssm.preprocessing import ToAbsorbance, ContinuumRemoval, Log1p
from lssm.dataloaders import SpectralDataset, get_dls
from lssm.callbacks import (MetricsCB, BatchSchedCB, BatchTransformCB,
DeviceCB, TrainCB, ProgressCB)
from lssm.transforms import GADFTfm, _resizeTfm, StatsTfm
```

### Loading training & validation data

1. Load model from `timm` python package, Deep Learning
State-Of-The-Art (SOTA) pre-trained models:

``` python
model_name = 'resnet18'
model = timm.create_model(model_name, pretrained=True, in_chans=1, num_classes=1)
```

2. Automatically download large spectral libraries developed by our
colleagues at [WCRC](https://www.woodwellclimate.org). We focus on
exchangeable potassium in the example below:

``` python
analytes = 'k.ext_usda.a725_cmolc.kg'
data = load_ossl(analytes, spectra_type='visnir')
X, y, X_names, smp_idx, ds_name, ds_label = data
```

Reading & selecting data ...

3. A bit of data features and target preprocessing:

``` python
X = Pipeline([('to_abs', ToAbsorbance()),
('cr', ContinuumRemoval(X_names))]).fit_transform(X)

y = Log1p().fit_transform(y)
```

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 44489/44489 [00:15<00:00, 2850.84it/s]

4. Typical train/test split to get a train and valid dataset:

``` python
n_smp = 5000 # For demo. purpose (in reality we have > 50K)
X_train, X_valid, y_train, y_valid = train_test_split(X[:n_smp, :], y[:n_smp],
test_size=0.1,
stratify=ds_name[:n_smp],
random_state=41)
```

5. Finally, creating a custom PyTorch `DataLoader`:

``` python
train_ds, valid_ds = [SpectralDataset(X, y, )
for X, y, in [(X_train, y_train), (X_valid, y_valid)]]

# Then PyTorch dataloaders
dls = get_dls(train_ds, valid_ds, bs=32)
```

### Training

``` python
epochs = 1
lr = 5e-3

# We use `r2` along to assess performance
metrics = MetricsCB(r2=R2Score())

# We use Once Cycle Learning Rate scheduling approach
tmax = epochs * len(dls.train)
sched = partial(lr_scheduler.OneCycleLR, max_lr=lr, total_steps=tmax)

# A series of preprocessing performed on GPUs
# - put to GPU
# - transform to 1D to 2D spectra using Gramian Angular Difference Field (GADF)
# - resize the 2D version
# - apply pre-trained model stats
xtra = [BatchSchedCB(sched)]
gadf = BatchTransformCB(GADFTfm())
resize = BatchTransformCB(_resizeTfm)
stats = BatchTransformCB(StatsTfm(model.default_cfg))

cbs = [DeviceCB(), gadf, resize, stats, TrainCB(),
metrics, ProgressCB(plot=False)]

learn = Learner(model, dls, nn.MSELoss(), lr=lr,
cbs=cbs+xtra, opt_func=optim.AdamW)

learn.fit(epochs)
```

/* Turns off some styling */
progress {
/* gets rid of default border in Firefox and Opera. */
border: none;
/* Needs to be in here for Safari polyfill so background images work as expected. */
background-size: auto;
}
progress:not([value]), progress:not([value])::-webkit-progress-bar {
background: repeating-linear-gradient(45deg, #7e7e7e, #7e7e7e 10px, #5c5c5c 10px, #5c5c5c 20px);
}
.progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {
background: #F44336;
}



0.00% [0/1 00:00<?]



4.39% [55/1252 00:23<08:42 0.084]