https://github.com/ntnu-ai-lab/selfpab

Large-Scale Pre-Training for Dual-Accelerometer Human Activity Recognition
https://github.com/ntnu-ai-lab/selfpab
Last synced: 4 months ago
JSON representation
Large-Scale Pre-Training for Dual-Accelerometer Human Activity Recognition
Host: GitHub
URL: https://github.com/ntnu-ai-lab/selfpab
Owner: ntnu-ai-lab
License: mit
Created: 2023-10-13T16:12:44.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-04-08T16:38:17.000Z (over 1 year ago)
Last Synced: 2024-05-15T09:54:39.725Z (about 1 year ago)
Language: Python
Size: 105 KB
Stars: 2
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

        # SelfPAB

Implementations of the SelfPAB, MonoSelfPAB, and LTA2V methods presented in our papers: "[SelfPAB: large-scale pre-training on accelerometer data for human activity recognition](https://doi.org/10.1007/s10489-024-05322-3)", "[Self-supervised Learning with Randomized Cross-sensor Masked Reconstruction for Human Activity Recognition](https://www.sciencedirect.com/science/article/pii/S0952197623016627)", and "[Long-term self-supervised learning for accelerometer-based sleep–wake recognition](https://doi.org/10.1016/j.engappai.2024.109758)", respectively.

## Preface: Access to HUNT4 pre-training data

The HUNT4 subset used for pre-training can be requested by contacting [kontakt\@hunt.ntnu.no](mailto:[email protected]?subject=HUNT4%20accelerometer%20snippets). Ask for the data used in the paper "Self-supervised Learning with Randomized Cross-sensor Masked Reconstruction for Human Activity Recognition".

## Requirements

[![git-lfs 3.4.1](https://img.shields.io/badge/Git_LFS-3.4.1-green)](https://git-lfs.com)

[![Python 3.8.10](https://img.shields.io/badge/Python_Versions-3.8_%7C_3.9_%7C_3.10-blue)](https://www.python.org/downloads/release/python-3810/)

```bash

pip install -r requirements.txt

```

__Note__: Per default the script tries to use a GPU to run the trainings. If no supported GPU can be allocated, a MisconfigurationException will occur. All the training scripts can be executed on the CPU instead by setting the "NUM_GPUS" parameter in the corresponding config.yml files to an empty list:

```bash

NUM_GPUS: []

```

## Download Datasets

Download the required datasets. Currently, the downstream datasets [HARTH](https://archive.ics.uci.edu/dataset/779/harth), [HAR70+](https://archive.ics.uci.edu/dataset/780/har70), [PAMAP2](https://archive.ics.uci.edu/dataset/231/pamap2+physical+activity+monitoring), [USC-HAD](https://sipi.usc.edu/had/) and [DualSleep](https://doi.org/10.18710/UGNIFE) are supported. The HUNT4 pre-training data can be requested by contacting [kontakt\@hunt.ntnu.no](mailto:[email protected]?subject=HUNT4%20accelerometer%20snippets), see Preface.

```bash

python download_dataset.py 

# Example (HARTH): python download_dataset.py harth

# Example (HAR70+): python download_dataset.py har70plus

# Example (PAMAP2): python download_dataset.py pamap2

# Example (USC-HAD): python download_dataset.py uschad

# Example (DualSleep): python download_dataset.py dualsleep

```

This command will download the given dataset into the [data/](https://github.com/ntnu-ai-lab/SelfPAB/tree/main/data) folder.

## Upstream Pre-training

After unzipping the pre-training HUNT4 data and storing it in `data/hunt4`, the upstream training can be started with:

```bash

python upstream.py -p  -d 

# Example (SelfPAB): python upstream.py -p params/selfPAB_upstream/config.yml -d data/hunt4/

# Example (MonoSelfPAB): python upstream.py -p params/MonoSelfPAB_upstream/config.yml -d data/hunt4/

```

The [params/selfPAB_upstream/config.yml](https://github.com/ntnu-ai-lab/SelfPAB/blob/main/params/selfPAB_upstream/config.yml), [params/MonoSelfPAB_upstream/config.yml](https://github.com/ntnu-ai-lab/SelfPAB/blob/main/params/MonoSelfPAB_upstream/config.yml), and [params/LTA2V_upstream/config.yml](https://github.com/ntnu-ai-lab/SelfPAB/blob/main/params/LTA2V_upstream/config.yml) files contain all upstream model hyperparameters, presented in our papers, for SelfPAB, MonoSelfPAB, and LTA2V, respectively.

After upstream pre-training, the model with the best validation loss is stored as .ckpt file. This saved model can now be used as feature extractor for downstream training.

- The [parmas/selfPAB_upstream/upstream_model.ckpt](https://github.com/ntnu-ai-lab/SelfPAB/blob/main/params/selfPAB_upstream/upstream_model.ckpt) is the transformer encoder, which we pre-trained on 100,000 hours of unlabeled HUNT4 data using the "masked reconstruction" auxiliary task, as presented in our paper: Large-Scale Pre-Training for Dual-Accelerometer Human Activity Recognition.

- The [parmas/MonoSelfPAB_upstream/upstream_model.ckpt](https://github.com/ntnu-ai-lab/SelfPAB/blob/main/params/MonoSelfPAB_upstream/upstream_model.ckpt) is the transformer encoder, which we pre-trained on 100,000 hours of unlabeled HUNT4 data using the "randomized cross-sensor masked reconstruction (RCSMR)" auxiliary task, as presented in our paper: Self-supervised Learning with Randomized Cross-sensor Masked Reconstruction for Human Activity Recognition.

- The [parmas/LTA2V_upstream/upstream_model.ckpt](https://github.com/ntnu-ai-lab/SelfPAB/blob/main/params/LTA2V_upstream/upstream_model.ckpt) is the transformer encoder, which we pre-trained on 821,700 hours of unlabeled HUNT4 data using the "masked reconstruction" auxiliary task with long-term (30 hours) spectrograms and global positional encoding, as presented in our paper: Long-term self-supervised learning for accelerometer-based sleep–wake recognition.

## Downstream Training

A downstream leave-one-subject-out cross-validation (LOSO) / 5-fold cross-validation can be started with:

```bash

python loo_cross_validation.py -p  -d 

# Example (SelfPAB, HARTH): python loo_cross_validation.py -p params/selfPAB_downstream_experiments/selfPAB_downstream_harth/config.yml -d data/harth/

# Example (MonoSelfPAB, HAR70+[Back]): python loo_cross_validation.py -p params/MonoSelfPAB_downstream_experiments/MonoSelfPAB_downstream_har70_B/config.yml -d data/har70plus/

# Example (MonoSelfPAB, PAMAP2): python loo_cross_validation.py -p params/MonoSelfPAB_downstream_experiments/MonoSelfPAB_downstream_pamap2/config.yml -d data/pamap2/

# Example (LTA2V, DualSleep): python loo_cross_validation.py -p params/LTA2V_downstream_experiments/LTA2V_downstream_DualSleep/config.yml -d data/dualsleep/

```

For HARTH, HAR70+, and DualSleep LOSOs are performed. For PAMAP2 and USC-HAD 5-fold cross-validations are performed with this command.

The downstream config has an argument called "upstream_model_path", defining the upstream model to use. In the [first example above](https://github.com/ntnu-ai-lab/SelfPAB/blob/main/params/selfPAB_downstream_experiments/selfPAB_downstream_harth/config.yml) the upstream model is set to be our pre-trained SelfPAB: [parmas/selfPAB_upstream/upstream_model.ckpt](https://github.com/ntnu-ai-lab/SelfPAB/blob/main/params/selfPAB_upstream/upstream_model.ckpt). In the [second example](https://github.com/ntnu-ai-lab/SelfPAB/blob/main/params/MonoSelfPAB_downstream_experiments/MonoSelfPAB_downstream_har70_B/config.yml) the upstream model is set to be our pre-trained MonoSelfPAB: [parmas/MonoSelfPAB_upstream/upstream_model.ckpt](https://github.com/ntnu-ai-lab/SelfPAB/blob/main/params/MonoSelfPAB_upstream/upstream_model.ckpt). In the [fourth example](https://github.com/ntnu-ai-lab/SelfPAB/blob/main/params/LTA2V_downstream_experiments/LTA2V_downstream_DualSleep/config.yml) the upstream model is set to be our pre-trained LTA2V: [parmas/LTA2V_upstream/upstream_model.ckpt](https://github.com/ntnu-ai-lab/SelfPAB/blob/main/params/LTA2V_upstream/upstream_model.ckpt).

## Results Visualization

The training results/progress can be logged in [Weights and Biases](https://wandb.ai/) by setting the WANDB argument in the config.yml files to True. For downstream training, the WANDB_KEY has to be provided in the config.yml file as well.

The LOSO results are stored as .pkl files in a params subfolder called loso_cmats. These can be visualized using the [loso_viz.ipynb](https://github.com/ntnu-ai-lab/SelfPAB/blob/main/loso_viz.ipynb) script.

## Embedding Generation

In case you are interested in investigating the latent representations generated by the respective upstream model, you can use the feature_extraction.py script:

```bash

python feature_extraction.py -p  -d 

# Example (SelfPAB, HARTH): python feature_extraction.py -p params/selfPAB_downstream_experiments/selfPAB_downstream_harth/config.yml -d data/harth/

```

It will create the feature vectors of the given dataset batch-wise in a features.h5 file.

## Citation

If you use the [pre-trained SelfPAB upstream model](https://github.com/ntnu-ai-lab/SelfPAB/blob/main/params/selfPAB_upstream/upstream_model.ckpt) for your research, please cite the following paper:

```bibtex

@article{logacjovSelfPABLargescalePretraining2024,

  title = {{{SelfPAB}}: Large-Scale Pre-Training on Accelerometer Data for Human Activity Recognition},

  shorttitle = {{{SelfPAB}}},

  author = {Logacjov, Aleksej and Herland, Sverre and Ustad, Astrid and Bach, Kerstin},

  year = {2024},

  month = mar,

  journal = {Applied Intelligence},

  issn = {1573-7497},

  doi = {10.1007/s10489-024-05322-3},

  urldate = {2024-03-29},

  keywords = {Accelerometer,Human activity recognition,Machine learning,Physical activity behavior,Self-supervised learning,Transformer},

}

```

If you use the [pre-trained MonoSelfPAB upstream model](https://github.com/ntnu-ai-lab/SelfPAB/blob/main/params/MonoSelfPAB_upstream/upstream_model.ckpt) for your research, please cite the following paper:

```bibtex

@article{logacjovSelfsupervisedLearningRandomized2024,

  title = {Self-Supervised Learning with Randomized Cross-Sensor Masked Reconstruction for Human Activity Recognition},

  author = {Logacjov, Aleksej and Bach, Kerstin},

  year = {2024},

  month = feb,

  journal = {Engineering Applications of Artificial Intelligence},

  volume = {128},

  pages = {107478},

  issn = {0952-1976},

  doi = {10.1016/j.engappai.2023.107478},

  urldate = {2023-11-15},

  keywords = {Accelerometer,Human activity recognition,Machine learning,Representation learning,Self-supervised learning,Transformer}

}

```

If you use the [pre-trained LTA2V upstream model](https://github.com/ntnu-ai-lab/SelfPAB/blob/main/params/LTA2V_upstream/upstream_model.ckpt) for your research, please cite the following paper:

```bibtex

@article{logacjovLongtermSelfsupervisedLearning2025,

  title = {Long-Term Self-Supervised Learning for Accelerometer-Based Sleep--Wake Recognition},

  author = {Logacjov, Aleksej and Bach, Kerstin and Mork, Paul Jarle},

  year = {2025},

  month = feb,

  journal = {Engineering Applications of Artificial Intelligence},

  volume = {141},

  pages = {109758},

  issn = {0952-1976},

  doi = {10.1016/j.engappai.2024.109758},

  urldate = {2024-12-09},

  keywords = {Accelerometer,Machine learning,Representation learning,Self-supervised learning,Sleep-wake recognition,Transformer}

}

```
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ntnu-ai-lab/selfpab

Awesome Lists containing this project

README