Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/franckalbinet/lssm

Modelling pipeline to develop and monitor Large Soil Spectral Models (LSSM)
https://github.com/franckalbinet/lssm

Last synced: 3 months ago
JSON representation

Modelling pipeline to develop and monitor Large Soil Spectral Models (LSSM)

Host: GitHub
URL: https://github.com/franckalbinet/lssm
Owner: franckalbinet
License: apache-2.0
Created: 2023-09-17T19:22:32.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-07-03T21:06:05.000Z (8 months ago)
Last Synced: 2024-11-08T23:52:30.560Z (3 months ago)
Language: Jupyter Notebook
Size: 94.5 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # Large Soil Spectral Models (LSSM)

This is a Python package allowing to reproduce the research work done by

[Franck Albinet](https://www.linkedin.com/in/franckalbinet) in the

context of a PhD @ [KU Leuven](https://www.kuleuven.be/) titled

**“Multiscale Characterization of Exchangeable Potassium Content in Soil

to Remediate Agricultural Land Affected by Radioactive Contamination

using Machine Learning, Soil Spectroscopy and Remote Sensing”**.

**Our first paper** [Albinet, F., Peng, Y., Eguchi, T., Smolders, E.,

Dercon, G., 2022. Prediction of exchangeable potassium in soil through

mid-infrared spectroscopy and deep learning: From prediction to

explainability. Artificial Intelligence in Agriculture 6,

230–241.](https://www.sciencedirect.com/science/article/pii/S2589721722000186)

investigated the possibility to predict exchangeable potassium in soil

using large Mid-infrared soil spectral libraries and Deep Learning. Code

available [here](https://github.com/franckalbinet/mirzai).

We are now **exploring the potential to characterize and predict

exchangeable potassium using both Near- and Mid-infrared soil

spectroscopy, with a focus on leveraging advanced Deep Learning models

such as ResNet and ViT transformers through transfer learning**.

*Our Deep Learning pipeline is primarily based on the approach described

by [Jeremy Howard](https://github.com/fastai/course22p2)*.

## Install

``` sh

pip install lssm

```

## Getting started

We demonstrate a typical workflow below to showcase our method.

``` python

from pathlib import Path

from functools import partial

from sklearn.pipeline import Pipeline

from sklearn.model_selection import train_test_split

from torch import optim, nn

import timm

from torcheval.metrics import R2Score

from torch.optim import lr_scheduler

from lssm.loading import load_ossl

from lssm.learner import Learner

from lssm.preprocessing import ToAbsorbance, ContinuumRemoval, Log1p

from lssm.dataloaders import SpectralDataset, get_dls

from lssm.callbacks import (MetricsCB, BatchSchedCB, BatchTransformCB,

                            DeviceCB, TrainCB, ProgressCB)

from lssm.transforms import GADFTfm, _resizeTfm, StatsTfm

```

### Loading training & validation data

1.  Load model from `timm` python package, Deep Learning

    State-Of-The-Art (SOTA) pre-trained models:

``` python

model_name = 'resnet18'

model = timm.create_model(model_name, pretrained=True, in_chans=1, num_classes=1)

```

2.  Automatically download large spectral libraries developed by our

    colleagues at [WCRC](https://www.woodwellclimate.org). We focus on

    exchangeable potassium in the example below:

``` python

analytes = 'k.ext_usda.a725_cmolc.kg'

data = load_ossl(analytes, spectra_type='visnir')

X, y, X_names, smp_idx, ds_name, ds_label = data

```

    Reading & selecting data ...

3.  A bit of data features and target preprocessing:

``` python

X = Pipeline([('to_abs', ToAbsorbance()), 

              ('cr', ContinuumRemoval(X_names))]).fit_transform(X)

y = Log1p().fit_transform(y)

```

    100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 44489/44489 [00:15<00:00, 2850.84it/s]

4.  Typical train/test split to get a train and valid dataset:

``` python

n_smp = 5000 # For demo. purpose (in reality we have > 50K)

X_train, X_valid, y_train, y_valid = train_test_split(X[:n_smp, :], y[:n_smp], 

                                                      test_size=0.1,

                                                      stratify=ds_name[:n_smp], 

                                                      random_state=41)

```

5.  Finally, creating a custom PyTorch `DataLoader`:

``` python

train_ds, valid_ds = [SpectralDataset(X, y, ) 

                      for X, y, in [(X_train, y_train), (X_valid, y_valid)]]

# Then PyTorch dataloaders

dls = get_dls(train_ds, valid_ds, bs=32)

```

### Training

``` python

epochs = 1

lr = 5e-3

# We use `r2` along to assess performance

metrics = MetricsCB(r2=R2Score())

# We use Once Cycle Learning Rate scheduling approach

tmax = epochs * len(dls.train)

sched = partial(lr_scheduler.OneCycleLR, max_lr=lr, total_steps=tmax)

# A series of preprocessing performed on GPUs

#    - put to GPU

#    - transform to 1D to 2D spectra using Gramian Angular Difference Field (GADF)

#    - resize the 2D version

#    - apply pre-trained model stats

xtra = [BatchSchedCB(sched)]

gadf = BatchTransformCB(GADFTfm())

resize = BatchTransformCB(_resizeTfm)

stats = BatchTransformCB(StatsTfm(model.default_cfg))

cbs = [DeviceCB(), gadf, resize, stats, TrainCB(), 

       metrics, ProgressCB(plot=False)]

learn = Learner(model, dls, nn.MSELoss(), lr=lr, 

                cbs=cbs+xtra, opt_func=optim.AdamW)

learn.fit(epochs)

```

    /* Turns off some styling */

    progress {

        /* gets rid of default border in Firefox and Opera. */

        border: none;

        /* Needs to be in here for Safari polyfill so background images work as expected. */

        background-size: auto;

    }

    progress:not([value]), progress:not([value])::-webkit-progress-bar {

        background: repeating-linear-gradient(45deg, #7e7e7e, #7e7e7e 10px, #5c5c5c 10px, #5c5c5c 20px);

    }

    .progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {

        background: #F44336;

    }

    


      

      0.00% [0/1 00:00<?]

    

    

    


      

      4.39% [55/1252 00:23<08:42 0.084]