{"id":18418949,"url":"https://github.com/smrfeld/tsmixer-pytorch","last_synced_at":"2025-04-07T13:31:25.425Z","repository":{"id":208142863,"uuid":"720542595","full_name":"smrfeld/tsmixer-pytorch","owner":"smrfeld","description":"TSMixer in PyTorch","archived":false,"fork":false,"pushed_at":"2023-11-24T08:37:00.000Z","size":688,"stargazers_count":19,"open_issues_count":0,"forks_count":3,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-22T19:03:19.390Z","etag":null,"topics":["plotly","python","pytorch","time-series","time-series-forecasting","tsmixer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/smrfeld.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-11-18T19:56:18.000Z","updated_at":"2025-03-15T22:55:35.000Z","dependencies_parsed_at":"2023-12-17T07:30:42.737Z","dependency_job_id":"32e9cd8e-4601-49cb-bd41-9f508275c905","html_url":"https://github.com/smrfeld/tsmixer-pytorch","commit_stats":{"total_commits":60,"total_committers":1,"mean_commits":60.0,"dds":0.0,"last_synced_commit":"342a6ebb323efff75f96909203c64c9e1e7d7aa5"},"previous_names":["smrfeld/tsmixer-pytorch"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smrfeld%2Ftsmixer-pytorch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smrfeld%2Ftsmixer-pytorch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smrfeld%2Ftsmixer-pytorch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smrfeld%2Ftsmixer-pytorch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/smrfeld","download_url":"https://codeload.github.com/smrfeld/tsmixer-pytorch/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247661696,"owners_count":20975101,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["plotly","python","pytorch","time-series","time-series-forecasting","tsmixer"],"created_at":"2024-11-06T04:15:05.769Z","updated_at":"2025-04-07T13:31:25.024Z","avatar_url":"https://github.com/smrfeld.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# TSMixer in PyTorch\n\nReimplementation of TSMixer in PyTorch.\n\n* Original paper: [https://arxiv.org/pdf/2303.06053.pdf](https://arxiv.org/abs/2303.06053)\n* Similar implementations: [https://github.com/marcopeix/time-series-analysis/blob/master/TSMixer.ipynb](https://github.com/marcopeix/time-series-analysis/blob/master/TSMixer.ipynb)\n\n## Sample results\n\n![Predictions on validation set](readme_figures/preds.png)\n*Predictions on validation set*\n\n![Training loss](readme_figures/loss.png)\n*Loss during training*\n\nParameters used for example:\n* `input_length`: 512\n* `prediction_length`: 96\n* `no_features`: 7\n* `no_mixer_layers`: 4\n* `dataset`: ETTh1.csv\n* `batch_size`: 32\n* `num_epochs`: 100 with early stopping after 5 epochs without improvement\n* `learning_rate`: 0.00001\n* `optimizer`: Adam\n* `validation_split_holdout`: 0.2 - last 20% of the time series data is used for validation\n* `dropout`: 0.3\n* `feat_mixing_hidden_channels`: 256 - number of hidden channels in the feature mixing layer\n\n## Data\n\nYou can find the raw ETDataset data [here](https://github.com/zhouhaoyi/ETDataset/tree/11ab373cf9c9f5be7698e219a5a170e1b1c8a930), specifically:\n\n* [ETTh1.csv](https://github.com/zhouhaoyi/ETDataset/raw/11ab373cf9c9f5be7698e219a5a170e1b1c8a930/ETT-small/ETTh1.csv)\n* [ETTh2.csv](https://github.com/zhouhaoyi/ETDataset/raw/11ab373cf9c9f5be7698e219a5a170e1b1c8a930/ETT-small/ETTh2.csv)\n* [ETTm1.csv](https://github.com/zhouhaoyi/ETDataset/raw/11ab373cf9c9f5be7698e219a5a170e1b1c8a930/ETT-small/ETTm1.csv)\n* [ETTm2.csv](https://github.com/zhouhaoyi/ETDataset/raw/11ab373cf9c9f5be7698e219a5a170e1b1c8a930/ETT-small/ETTm2.csv)\n\nYou can use the `download_etdataset.py` script to download the data:\n\n```bash\npython download_etdataset.py\n```\n\n## Running\n\nInstall the requirements:\n\n```bash\npip install -r requirements.txt\n```\n\nTrain the model:\n\n```bash\npython main.py --conf conf.etdataset.yml --command train\n```\n\nThe output will be in the `output_dir` directory specified in the config file. The config file is in YAML format. The format is defined by [utils/tsmixer_conf.py](utils/tsmixer_conf.py).\n\nPlot the loss curves:\n\n```bash\npython main.py --conf conf.etdataset.yml --command loss --show\n```\n\nPredict some of the validation data and plot it:\n\n```bash\npython main.py --conf conf.etdataset.yml --command predict --show\n```\n\nRun a grid search over the hyperparameters:\n\n```bash\npython main.py --conf conf.etdataset.gridsearch.yml --command grid-search\n```\n\nNote that the format of the config file is different for the grid search. The format is defined by [utils/tsmixer_grid_search_conf.py](utils/tsmixer_grid_search_conf.py).\n\n### Tests\n\nRun the tests with `pytest`:\n\n```bash\ncd tests\npytest\n```\n\n## Implementation notes from the paper\n\n### Training parameters\n\n\u003e For multivariate long-term forecasting datasets, we follow the settings in recent research (Liu et al., 2022b; Zhou et al., 2022a; Nie et al., 2023). We set the input length L = 512 as suggested in Nie et al. (2023) and evaluate the results for prediction lengths of T = {96, 192, 336, 720}. We use the Adam optimization algorithm (Kingma \u0026 Ba, 2015) to minimize the mean square error (MSE) training objective, and consider MSE and mean absolute error (MAE) as the evaluation metrics. We apply reversible instance normalization (Kim et al., 2022) to ensure a fair comparison with the state-of-the-art PatchTST (Nie et al., 2023).\n\n\u003e For the M5 dataset, we mostly follow the data processing from Alexandrov et al. (2020). We consider the prediction length of T = 28 (same as the competition), and set the input length to L = 35. We optimize log-likelihood of negative binomial distribution as suggested by Salinas et al. (2020). We follow the competition’s protocol (Makridakis et al., 2022) to aggregate the predictions at different levels and evaluate them using the weighted root mean squared scaled error (WRMSSE). More details about the experimental setup and hyperparameter tuning can be found in Appendices C and E.\n\n### Reversible Instance Normalization for Time Series Forecasting\n\nReversible instance normalization https://openreview.net/pdf?id=cGDAkQo1C0p\n\n\u003e First, we normalize the input data x(i) using its instance-specific mean and stan- dard deviation, which is widely accepted as instance normalization (Ulyanov et al., 2016). The mean and standard deviation are computed for every instance x(i) ∈ RTx of the input data (Fig. 2(a-3)) as\n\n```\nMean[xi_kt] = mean_{j=1}^Tx ( xi_kj )\nVar[xi_kt] = var_{j=1}^Tx ( xi_kj )\n```\nWhere `i` = sample in the batch, `K` = num variables (features), `Tx` = num time steps in input, `Ty` = num time steps in output (prediction).\n\n\u003e Then, we apply the normalization to the **input data** (sent to model) as\n\n```\nxhati_kt = gamma_k * (xi_kt - Mean[xi_kt]) / sqrt(Var[xi_kt] + epsilon) + beta_k\n```\n\nwhere gamma_k and beta_k are learnable parameters for each variable k (**recall: K = num features**).\n\nAfter final layer of model, we get output `yhati_kt`, apply the reverse transformation to the **output data** (sent to loss function) as\n\n```\nyi_kt = (yhati_kt - beta_k) * sqrt(Var[xi_kt] + epsilon) / gamma_k + Mean[xi_kt]\n```\n\nwhere `yhati_kt` is the output of the model for variable `k` at time `t` for sample `i`, and `yi_kt` is sent to the loss function.\n\n### Details on multivariate time series forecasting experiments\n\nInput = matrix X of size (L,C) where L = num time steps, C = num features\nOutput = prediction of size (T,C) where T = num time steps\n\n\u003e B.3.2 Basic TSMixer for Multivariate Time Series Forecasting\n\u003e For long-term time series forecasting (LTSF) tasks, TSMixer only uses the historical target time series X as input. A series of mixer blocks are applied to project the input data to a latent representation of size C. The final output is then projected to the prediction length T:\n```\nO_1 = Mix[C-\u003eC] (X)\nO_k = Mix[C-\u003eC] (O_{k-1}), for k = 2,...,K\nY = TP[L-\u003eT] (O_K)\n```\n\u003e where Ok is the latent representation of the k-th mixer block and Yˆ is the prediction. We project the sequence to length T after the mixer blocks as T may be quite long in LTSF tasks.\n\ni.e. keep the number of features the same as C, and use the same input time length L in the mixture blocks, then project to longer length L for the output.\n\n### Hidden layers of feature mixing\n\n\u003e To increase the model capacity, we modify the hidden layers in Feature Mixing by using W2 ∈ (H×C),W3 ∈ (C×H),b2 ∈ H,b3 ∈ C in Eq. equation B.3.1, where H is a hyper-parameter indicating the hidden size.\n\ni.e. in th feature mixing block, where there are two fully connected layers, the first projects the number of channels from C-\u003eH and the second from H-\u003eC, where H is an additional parameter.\n\n\u003e Another modification is using pre-normalization (Xiong et al., 2020) instead of post-normalization in residual blocks to keep the input scale.\n\ni.e. apply normalization to the input of the feature mixing block, instead of the output.\n\n## Standardization of data\n\n\u003e Specifically, we standardize each covariate independently and do not re-scale the data when evaluating the performance. \n\n\u003e Global normalization: Global normalization standardizes all variates of time series independently as a data pre-processing. The standardized data is then used for training and evaluation. It is a common setup in long-term time series forecasting experiments to prevent from the affects of different variate scales. For M5, since there is only one target time series (sales), we do not apply the global normalization.\n\nStandardize each feature independently based on the training split, then use the same mean and standard deviation for the test set.\n\n\u003e We train each model with a maximum 100 epochs and do early stopping if the validation loss is not improved after 5 epochs.\n\nMax 100 epochs, early stopping after 5 epochs without improvement.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsmrfeld%2Ftsmixer-pytorch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsmrfeld%2Ftsmixer-pytorch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsmrfeld%2Ftsmixer-pytorch/lists"}