Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/WenjieDu/SAITS
The official PyTorch implementation of the paper "SAITS: Self-Attention-based Imputation for Time Series". A fast and state-of-the-art (SOTA) deep-learning neural network model for efficient time-series imputation (impute multivariate incomplete time series containing NaN missing data/values with machine learning). https://arxiv.org/abs/2202.08516
https://github.com/WenjieDu/SAITS
attention attention-mechanism deep-learning imputation imputation-model impute incomplete-data incomplete-time-series interpolation irregular-sampling machine-learning missing-values partially-observed partially-observed-data partially-observed-time-series pytorch self-attention time-series time-series-imputation transformer
Last synced: 1 day ago
JSON representation
The official PyTorch implementation of the paper "SAITS: Self-Attention-based Imputation for Time Series". A fast and state-of-the-art (SOTA) deep-learning neural network model for efficient time-series imputation (impute multivariate incomplete time series containing NaN missing data/values with machine learning). https://arxiv.org/abs/2202.08516
- Host: GitHub
- URL: https://github.com/WenjieDu/SAITS
- Owner: WenjieDu
- License: mit
- Created: 2021-12-07T14:57:37.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2024-04-25T10:10:15.000Z (6 months ago)
- Last Synced: 2024-05-21T11:48:32.908Z (6 months ago)
- Topics: attention, attention-mechanism, deep-learning, imputation, imputation-model, impute, incomplete-data, incomplete-time-series, interpolation, irregular-sampling, machine-learning, missing-values, partially-observed, partially-observed-data, partially-observed-time-series, pytorch, self-attention, time-series, time-series-imputation, transformer
- Language: Python
- Homepage: https://doi.org/10.1016/j.eswa.2023.119619
- Size: 587 KB
- Stars: 271
- Watchers: 5
- Forks: 47
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
- awesome-time-series - [code
README
> [!TIP]
> **[Updates in Jun 2024]** π The 1st comprehensive time-seres imputation benchmark paper
[TSI-Bench: Benchmarking Time Series Imputation](https://arxiv.org/abs/2406.12747) now is public available.
The code is open source in the repo [Awesome_Imputation](https://github.com/WenjieDu/Awesome_Imputation).
With nearly 35,000 experiments, we provide a comprehensive benchmarking study on 28 imputation methods, 3 missing patterns (points, sequences, blocks),
various missing rates, and 8 real-world datasets.
>
> **[Updates in May 2024]** π₯ We applied SAITS embedding and training strategies to **iTransformer, FiLM, FreTS, Crossformer, PatchTST, DLinear, ETSformer, FEDformer,
> Informer, Autoformer, Non-stationary Transformer, Pyraformer, Reformer, SCINet, RevIN, Koopa, MICN, TiDE, and StemGNN** in PyPOTS
> to enable them applicable to the time-series imputation task.
>
> **[Updates in Feb 2024]** π Our survey paper [Deep Learning for Multivariate Time Series Imputation: A Survey](https://arxiv.org/abs/2402.04059) has been released on arXiv.
We comprehensively review the literature of the state-of-the-art deep-learning imputation methods for time series,
provide a taxonomy for them, and discuss the challenges and future directions in this field.**βΌοΈKind reminder: This document can help you solve many common questions, please read it before you run the code.**
The official code repository is for the paper [SAITS: Self-Attention-based Imputation for Time Series](https://doi.org/10.1016/j.eswa.2023.119619)
(preprint on arXiv is [here](https://arxiv.org/abs/2202.08516)), which has been accepted by the journal
*[Expert Systems with Applications (ESWA)](https://www.sciencedirect.com/journal/expert-systems-with-applications)*
[2022 IF 8.665, CiteScore 12.2, JCR-Q1, CAS-Q1, CCF-C]. You may never have heard of ESWA,
while it was ranked 1st in Google Scholar under the top publications of Artificial Intelligence in 2016
([info source](https://www.sciencedirect.com/journal/expert-systems-with-applications/about/news#expert-systems-with-applications-is-currently-ranked-no1-in)), and is still the top 1 AI journal according to Google Scholar metrics
([here is the current ranking list](https://scholar.google.com/citations?view_op=top_venues&hl=en&vq=eng_artificialintelligence) FYI).SAITS is the first work applying pure self-attention without any recursive design in the algorithm for general time series imputation.
Basically you can take it as a validated framework for time series imputation, like we've integrated 2οΈβ£0οΈβ£ forecasting models into PyPOTS by adapting SAITS framework.
More generally, you can use it for sequence imputation. Besides, the code here is open source under the MIT license.
Therefore, you're welcome to modify the SAITS code for your own research purpose and domain applications.
Of course, it probably needs a bit of modification in the model structure or loss functions for specific scenarios or data input.
And this is [an incomplete list](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&as_ylo=2022&q=%E2%80%9CSAITS%E2%80%9D+%22time+series%22) of scientific research referencing SAITS in their papers.π€ Please [cite SAITS](https://github.com/WenjieDu/SAITS#-citing-saits) in your publications if it helps with your work.
Please starπ this repo to help others notice SAITS if you think it is useful.
It really means a lot to our open-source research. Thank you!
BTW, you may also like
PyPOTS
for easily modeling your partially-observed time-series datasets.> [!IMPORTANT]
>
> **π£ Attention please:**
>
> SAITS now is available in [PyPOTS](https://github.com/WenjieDu/PyPOTS), a Python toolbox for data mining on POTS (Partially-Observed Time Series).
> An example of training SAITS for imputing dataset PhysioNet-2012 is shown below. With [PyPOTS](https://github.com/WenjieDu/PyPOTS), easy peasy! ππ Click here to see the example π
``` python
# Data preprocessing. Tedious, but PyPOTS can help.
import numpy as np
from sklearn.preprocessing import StandardScaler
from pygrinder import mcar
from pypots.data import load_specific_dataset
data = load_specific_dataset('physionet_2012') # PyPOTS will automatically download and extract it.
X = data['X']
num_samples = len(X['RecordID'].unique())
X = X.drop(['RecordID', 'Time'], axis = 1)
X = StandardScaler().fit_transform(X.to_numpy())
X = X.reshape(num_samples, 48, -1)
X_ori = X # keep X_ori for validation
X = mcar(X, 0.1) # randomly hold out 10% observed values as ground truth
dataset = {"X": X} # X for model input
print(X.shape) # (11988, 48, 37), 11988 samples and each sample has 48 time steps, 37 features# Model training. This is PyPOTS showtime.
from pypots.imputation import SAITS
from pypots.utils.metrics import calc_mae
saits = SAITS(n_steps=48, n_features=37, n_layers=2, d_model=256, d_ffn=128, n_heads=4, d_k=64, d_v=64, dropout=0.1, epochs=10)
# Here I use the whole dataset as the training set because ground truth is not visible to the model, you can also split it into train/val/test sets
saits.fit(dataset) # train the model on the dataset
imputation = saits.impute(dataset) # impute the originally-missing values and artificially-missing values
indicating_mask = np.isnan(X) ^ np.isnan(X_ori) # indicating mask for imputation error calculation
mae = calc_mae(imputation, np.nan_to_num(X_ori), indicating_mask) # calculate mean absolute error on the ground truth (artificially-missing values)
saits.save("save_it_here/saits_physionet2012.pypots") # save the model for future use
saits.load("save_it_here/saits_physionet2012.pypots") # reload the serialized model file for following imputation or training
```
βοΈ Welcome to the universe of PyPOTS. Enjoy it and have fun!## β Motivation and Performance
β¦Ώ **`Motivation`**: SAITS is developed primarily to help overcome the drawbacks (slow speed, memory constraints, and compounding error)
of RNN-based imputation models and to obtain the state-of-the-art (SOTA) imputation accuracy on partially-observed time series.β¦Ώ **`Performance`**: SAITS outperforms [BRITS](https://papers.nips.cc/paper/2018/hash/734e6bfcd358e25ac1db0a4241b95651-Abstract.html)
by **12% βΌ 38%** in MAE (mean absolute error) and achieves **2.0 βΌ 2.6** times faster training speed.
Furthermore, SAITS outperforms Transformer (trained by our joint-optimization approach) by **2% βΌ 19%** in MAE with a
more efficient model structure (to obtain comparable performance, SAITS needs only **15% βΌ 30%** parameters of Transformer).
Compared to another SOTA self-attention imputation model [NRTSI](https://github.com/lupalab/NRTSI), SAITS achieves
**7% βΌ 39%** smaller mean squared error (above 20% in nine out of sixteen cases), meanwhile, needs much
fewer parameters and less imputation time in practice.
Please refer to our [full paper](https://arxiv.org/pdf/2202.08516.pdf) for more details about SAITS' performance.## β Brief Graphical Illustration of Our Methodology
Here we only show the two main components of our method: the joint-optimization training approach and SAITS structure.
For the detailed description and explanation, please read our full paper `Paper_SAITS.pdf` in this repo
or [on arXiv](https://arxiv.org/pdf/2202.08516.pdf).
Fig. 1: Training approach
Fig. 2: SAITS structure
## β Citing SAITS
If you find SAITS is helpful to your work, please cite our paper as below,
βοΈstar this repository, and recommend it to others who you think may need it. π€ Thank you!```bibtex
@article{du2023saits,
title = {{SAITS: Self-Attention-based Imputation for Time Series}},
journal = {Expert Systems with Applications},
volume = {219},
pages = {119619},
year = {2023},
issn = {0957-4174},
doi = {10.1016/j.eswa.2023.119619},
url = {https://arxiv.org/abs/2202.08516},
author = {Wenjie Du and David Cote and Yan Liu},
}
```
or
> Wenjie Du, David Cote, and Yan Liu.
> SAITS: Self-Attention-based Imputation for Time Series.
> Expert Systems with Applications, 219:119619, 2023.### π Our latest survey and benchmarking research on time-series imputation may also be useful to your work:
```bibtex
@article{du2024tsibench,
title={TSI-Bench: Benchmarking Time Series Imputation},
author={Wenjie Du and Jun Wang and Linglong Qian and Yiyuan Yang and Fanxing Liu and Zepu Wang and Zina Ibrahim and Haoxin Liu and Zhiyuan Zhao and Yingjie Zhou and Wenjia Wang and Kaize Ding and Yuxuan Liang and B. Aditya Prakash and Qingsong Wen},
journal={arXiv preprint arXiv:2406.12747},
year={2024}
}
``````bibtex
@article{wang2024deep,
title={Deep Learning for Multivariate Time Series Imputation: A Survey},
author={Jun Wang and Wenjie Du and Wei Cao and Keli Zhang and Wenjia Wang and Yuxuan Liang and Qingsong Wen},
journal={arXiv preprint arXiv:2402.04059},
year={2024}
}
```### π₯ In case you use PyPOTS in your research, please also cite the following paper:
``` bibtex
@article{du2023pypots,
title={{PyPOTS: a Python toolbox for data mining on Partially-Observed Time Series}},
author={Wenjie Du},
journal={arXiv preprint arXiv:2305.18811},
year={2023},
}
```
or
> Wenjie Du.
> PyPOTS: a Python toolbox for data mining on Partially-Observed Time Series.
> arXiv, abs/2305.18811, 2023.## β Repository Structure
The implementation of SAITS is in dir [`modeling`](https://github.com/WenjieDu/SAITS/blob/main/modeling/SA_models.py).
We give configurations of our models in dir [`configs`](https://github.com/WenjieDu/SAITS/tree/main/configs), provide
the dataset links and preprocessing scripts in dir [`dataset_generating_scripts`](https://github.com/WenjieDu/SAITS/tree/main/dataset_generating_scripts).
Dir [`NNI_tuning`](https://github.com/WenjieDu/SAITS/tree/main/NNI_tuning) contains the hyper-parameter searching configurations.## β Development Environment
All dependencies of our development environment are listed in file [`conda_env_dependencies.yml`](https://github.com/WenjieDu/SAITS/blob/main/conda_env_dependencies.yml).
You can quickly create a usable python environment with an anaconda command `conda env create -f conda_env_dependencies.yml`.## β Datasets
For datasets downloading and generating, please check out the scripts in
dir [`dataset_generating_scripts`](https://github.com/WenjieDu/SAITS/tree/main/dataset_generating_scripts).## β Quick Run
Generate the dataset you need first. To do so, please check out the generating scripts in
dir [`dataset_generating_scripts`](https://github.com/WenjieDu/SAITS/tree/main/dataset_generating_scripts).After data generation, train and test your model, for example,
```shell
# create a dir to save logs and results
mkdir NIPS_results# train a model
nohup python run_models.py \
--config_path configs/PhysioNet2012_SAITS_best.ini \
> NIPS_results/PhysioNet2012_SAITS_best.out &# during training, you can run the blow command to read the training log
less NIPS_results/PhysioNet2012_SAITS_best.out# after training, pick the best model and modify the path of the model for testing in the config file, then run the below command to test the model
python run_models.py \
--config_path configs/PhysioNet2012_SAITS_best.ini \
--test_mode
```βοΈNote that paths of datasets and saving dirs may be different on personal computers, please check them in the configuration files.
## β Acknowledgments
Thanks to Ciena, Mitacs, and NSERC (Natural Sciences and Engineering Research Council of Canada) for funding support.
Thanks to all our reviewers for helping improve the quality of this paper.
Thanks to Ciena for providing computing resources.
And thank you all for your attention to this work.### β¨Stars/forks/issues/PRs are all welcome!
π Click to View Stargazers and Forkers:
[![Stargazers repo roster for @WenjieDu/SAITS](http://reporoster.com/stars/dark/WenjieDu/SAITS)](https://github.com/WenjieDu/SAITS/stargazers)
[![Forkers repo roster for @WenjieDu/SAITS](http://reporoster.com/forks/dark/WenjieDu/SAITS)](https://github.com/WenjieDu/SAITS/network/members)
## β Last but Not Least
If you have any additional questions or have interests in collaboration,
please take a look at [my GitHub profile](https://github.com/WenjieDu) and feel free to contact me π.