Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/zhihanyue/ts2vec
A universal time series representation learning framework
https://github.com/zhihanyue/ts2vec
Last synced: 15 days ago
JSON representation
A universal time series representation learning framework
- Host: GitHub
- URL: https://github.com/zhihanyue/ts2vec
- Owner: zhihanyue
- License: mit
- Created: 2021-06-04T05:54:04.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2024-03-28T16:13:50.000Z (8 months ago)
- Last Synced: 2024-05-18T22:09:47.820Z (6 months ago)
- Language: Python
- Homepage:
- Size: 35.2 KB
- Stars: 546
- Watchers: 5
- Forks: 132
- Open Issues: 23
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# TS2Vec
This repository contains the official implementation for the paper [TS2Vec: Towards Universal Representation of Time Series](https://arxiv.org/abs/2106.10466) (AAAI-22).
## Requirements
The recommended requirements for TS2Vec are specified as follows:
* Python 3.8
* torch==1.8.1
* scipy==1.6.1
* numpy==1.19.2
* pandas==1.0.1
* scikit_learn==0.24.2
* statsmodels==0.12.2
* Bottleneck==1.3.2The dependencies can be installed by:
```bash
pip install -r requirements.txt
```## Data
The datasets can be obtained and put into `datasets/` folder in the following way:
* [128 UCR datasets](https://www.cs.ucr.edu/~eamonn/time_series_data_2018) should be put into `datasets/UCR/` so that each data file can be located by `datasets/UCR//_*.csv`.
* [30 UEA datasets](http://www.timeseriesclassification.com) should be put into `datasets/UEA/` so that each data file can be located by `datasets/UEA//_*.arff`.
* [3 ETT datasets](https://github.com/zhouhaoyi/ETDataset) should be placed at `datasets/ETTh1.csv`, `datasets/ETTh2.csv` and `datasets/ETTm1.csv`.
* [Electricity dataset](https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014) should be preprocessed using `datasets/preprocess_electricity.py` and placed at `datasets/electricity.csv`.
* [Yahoo dataset](https://webscope.sandbox.yahoo.com/catalog.php?datatype=s&did=70) should be preprocessed using `datasets/preprocess_yahoo.py` and placed at `datasets/yahoo.pkl`.
* [KPI dataset](http://test-10056879.file.myqcloud.com/10056879/test/20180524_78431960010324/KPI%E5%BC%82%E5%B8%B8%E6%A3%80%E6%B5%8B%E5%86%B3%E8%B5%9B%E6%95%B0%E6%8D%AE%E9%9B%86.zip) should be preprocessed using `datasets/preprocess_kpi.py` and placed at `datasets/kpi.pkl`.## Usage
To train and evaluate TS2Vec on a dataset, run the following command:
```train & evaluate
python train.py --loader --batch-size --repr-dims --gpu --eval
```
The detailed descriptions about the arguments are as following:
| Parameter name | Description of parameter |
| --- | --- |
| dataset_name | The dataset name |
| run_name | The folder name used to save model, output and evaluation metrics. This can be set to any word |
| loader | The data loader used to load the experimental data. This can be set to `UCR`, `UEA`, `forecast_csv`, `forecast_csv_univar`, `anomaly`, or `anomaly_coldstart` |
| batch_size | The batch size (defaults to 8) |
| repr_dims | The representation dimensions (defaults to 320) |
| gpu | The gpu no. used for training and inference (defaults to 0) |
| eval | Whether to perform evaluation after training |(For descriptions of more arguments, run `python train.py -h`.)
After training and evaluation, the trained encoder, output and evaluation metrics can be found in `training/DatasetName__RunName_Date_Time/`.
**Scripts:** The scripts for reproduction are provided in `scripts/` folder.
## Code Example
```python
from ts2vec import TS2Vec
import datautils# Load the ECG200 dataset from UCR archive
train_data, train_labels, test_data, test_labels = datautils.load_UCR('ECG200')
# (Both train_data and test_data have a shape of n_instances x n_timestamps x n_features)# Train a TS2Vec model
model = TS2Vec(
input_dims=1,
device=0,
output_dims=320
)
loss_log = model.fit(
train_data,
verbose=True
)# Compute timestamp-level representations for test set
test_repr = model.encode(test_data) # n_instances x n_timestamps x output_dims# Compute instance-level representations for test set
test_repr = model.encode(test_data, encoding_window='full_series') # n_instances x output_dims# Sliding inference for test set
test_repr = model.encode(
test_data,
causal=True,
sliding_length=1,
sliding_padding=50
) # n_instances x n_timestamps x output_dims
# (The timestamp t's representation vector is computed using the observations located in [t-50, t])
```