{"id":13570939,"url":"https://github.com/philipdarke/torchtime","last_synced_at":"2025-04-23T20:14:52.018Z","repository":{"id":41393218,"uuid":"475093888","full_name":"philipdarke/torchtime","owner":"philipdarke","description":"Benchmark time series data sets for PyTorch","archived":false,"fork":false,"pushed_at":"2024-02-14T14:13:38.000Z","size":3223,"stargazers_count":35,"open_issues_count":2,"forks_count":5,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-23T20:14:32.160Z","etag":null,"topics":["classification","datasets","physionet","pytorch","supervised-learning","time-series"],"latest_commit_sha":null,"homepage":"https://philipdarke.com/torchtime","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/philipdarke.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-03-28T16:47:59.000Z","updated_at":"2025-02-25T16:06:27.000Z","dependencies_parsed_at":"2024-06-07T22:55:13.733Z","dependency_job_id":"3ca52a1c-a5b5-47d6-9af2-c371f913dda7","html_url":"https://github.com/philipdarke/torchtime","commit_stats":{"total_commits":28,"total_committers":1,"mean_commits":28.0,"dds":0.0,"last_synced_commit":"7feb206557905282aba4b7bcbf5ab939794b7972"},"previous_names":[],"tags_count":11,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philipdarke%2Ftorchtime","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philipdarke%2Ftorchtime/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philipdarke%2Ftorchtime/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/philipdarke%2Ftorchtime/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/philipdarke","download_url":"https://codeload.github.com/philipdarke/torchtime/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250506141,"owners_count":21441723,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["classification","datasets","physionet","pytorch","supervised-learning","time-series"],"created_at":"2024-08-01T14:00:56.718Z","updated_at":"2025-04-23T20:14:51.994Z","avatar_url":"https://github.com/philipdarke.png","language":"Python","funding_links":[],"categories":["📦 Packages"],"sub_categories":["Python"],"readme":"# Benchmark time series data sets for PyTorch\n\n[![PyPi](https://img.shields.io/pypi/v/torchtime)](https://pypi.org/project/torchtime)\n[![Build status](https://img.shields.io/github/actions/workflow/status/philipdarke/torchtime/build.yml?branch=main)](https://github.com/philipdarke/torchtime/actions/workflows/build.yml)\n![Coverage](https://philipdarke.com/torchtime/assets/coverage-badge.svg?dummy=8484744)\n[![License](https://img.shields.io/github/license/philipdarke/torchtime.svg)](https://github.com/philipdarke/torchtime/blob/main/LICENSE)\n[![DOI](https://img.shields.io/badge/DOI-10.48550%2FarXiv.2207.12503-blue)](https://doi.org/10.48550/arXiv.2207.12503)\n\nPyTorch data sets for supervised time series classification and prediction problems, including:\n\n* All UEA/UCR classification repository data sets\n* PhysioNet Challenge 2012 (in-hospital mortality)\n* PhysioNet Challenge 2019 (sepsis prediction)\n* A binary prediction variant of the 2019 PhysioNet Challenge\n\n## Why use `torchtime`?\n\n1. Saves time. You don't have to write your own PyTorch data classes.\n2. Better research. Use common, reproducible implementations of data sets for a level playing field when evaluating models.\n\n## Installation\n\nInstall PyTorch followed by `torchtime`:\n\n```bash\n$ pip install torchtime\n```\n\nor\n\n```bash\n$ conda install torchtime -c conda-forge\n```\n\nThere is currently no Windows build for `conda`. Feedback is welcome from `conda` users in particular.\n\n## Getting started\n\nData classes have a common API. The `split` argument determines whether training (\"*train*\"), validation (\"*val*\") or test (\"*test*\") data are returned. The size of the splits are controlled with the `train_prop` and (optional) `val_prop` arguments.\n\n### PhysioNet data sets\n\nThree [PhysioNet](https://physionet.org/) data sets are currently supported:\n\n* [`torchtime.data.PhysioNet2012`](https://philipdarke.com/torchtime/api/data.html#torchtime.data.PhysioNet2012) returns the 2012 challenge (in-hospital mortality) [[link]](https://physionet.org/content/challenge-2012/1.0.0/).\n* [`torchtime.data.PhysioNet2019`](https://philipdarke.com/torchtime/api/data.html#torchtime.data.PhysioNet2019) returns the 2019 challenge (sepsis prediction) [[link]](https://physionet.org/content/challenge-2019/1.0.0/).\n* [`torchtime.data.PhysioNet2019Binary`](https://philipdarke.com/torchtime/api/data.html#torchtime.data.PhysioNet2019Binary) returns a binary prediction variant of the 2019 challenge.\n\nFor example, to load training data for the 2012 challenge with a 70/30% training/validation split and create a [DataLoader](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) for model training:\n\n```python\nfrom torch.utils.data import DataLoader\nfrom torchtime.data import PhysioNet2012\n\nphysionet2012 = PhysioNet2012(\n    split=\"train\",\n    train_prop=0.7,\n)\ndataloader = DataLoader(physionet2012, batch_size=32)\n```\n\n### UEA/UCR repository data sets\n\nThe [`torchtime.data.UEA`](https://philipdarke.com/torchtime/api/data.html#torchtime.data.UEA) class returns the [UEA/UCR repository](https://www.timeseriesclassification.com/) data set specified by the `dataset` argument, for example:\n\n```python\nfrom torch.utils.data import DataLoader\nfrom torchtime.data import UEA\n\narrowhead = UEA(\n    dataset=\"ArrowHead\",\n    split=\"train\",\n    train_prop=0.7,\n)\ndataloader = DataLoader(arrowhead, batch_size=32)\n```\n\n### Using the DataLoader\n\nBatches are dictionaries of tensors `X`, `y` and `length`:\n\n* `X` are the time series data. The package follows the *batch first* convention therefore `X` has shape (*n*, *s*, *c*) where *n* is batch size, *s* is (longest) trajectory length and *c* is the number of channels. By default, the first channel is a time stamp.\n* `y` are one-hot encoded labels of shape (*n*, *l*) where *l* is the number of classes.\n* `length` are the length of each trajectory (before padding if sequences are of irregular length) i.e. a tensor of shape (*n*).\n\nFor example, ArrowHead is a univariate time series therefore `X` has two channels, the time stamp followed by the time series (*c* = 2). Each series has 251 observations (*s* = 251) and there are three classes (*l* = 3). For a batch size of 32:\n\n```python\nnext_batch = next(iter(dataloader))\nnext_batch[\"X\"].shape       # torch.Size([32, 251, 2])\nnext_batch[\"y\"].shape       # torch.Size([32, 3])\nnext_batch[\"length\"].shape  # torch.Size([32])\n```\n\nSee [Using DataLoaders](https://philipdarke.com/torchtime/tutorials/getting_started.html#using-dataloaders) for more information.\n\n## Advanced options\n\n* Missing data can be imputed by setting `impute` to *mean* (replace with training data channel means) or *forward* (replace with previous observation). Alternatively a custom imputation function can be passed to the `impute` argument.\n* A time stamp (added by default), missing data mask and the time since previous observation can be appended with the boolean arguments ``time``, ``mask`` and ``delta`` respectively.\n* Time series data are standardised using the `standardise` boolean argument.\n* The location of cached data can be changed with the ``path`` argument, for example to share a single cache location across projects.\n* For reproducibility, an optional random `seed` can be specified.\n* Missing data can be simulated using the `missing` argument to drop data at random from UEA/UCR data sets.\n\nSee the [tutorials](https://philipdarke.com/torchtime/tutorials/) and [API](https://philipdarke.com/torchtime/api/) for more information.\n\n## Other resources\n\nIf you're looking for the TensorFlow equivalent for PhysioNet data sets try [medical_ts_datasets](https://github.com/ExpectationMax/medical_ts_datasets).\n\n## Acknowledgements\n\n`torchtime` uses some of the data processing ideas in Kidger et al, 2020 [[1]](https://arxiv.org/abs/2005.08926) and Che et al, 2018 [[2]](https://doi.org/10.1038/s41598-018-24271-9).\n\nThis work is supported by the Engineering and Physical Sciences Research Council, Centre for Doctoral Training in Cloud Computing for Big Data, Newcastle University (grant number EP/L015358/1).\n\n## Citing `torchtime`\n\nIf you use this software, please cite the [paper](https://doi.org/10.48550/arXiv.2207.12503):\n\n```\n@software{darke_torchtime_2022,\n    author = Darke, Philip and Missier, Paolo and Bacardit, Jaume,\n    title = \"Benchmark time series data sets for {PyTorch} - the torchtime package\",\n    month = July,\n    year = 2022,\n    publisher={arXiv},\n    doi = 10.48550/arXiv.2207.12503,\n    url = https://doi.org/10.48550/arXiv.2207.12503,\n}\n```\n\nDOIs are also available for each version of the package [here](https://doi.org/10.5281/zenodo.6402406).\n\n## References\n\n1. Kidger, P, Morrill, J, Foster, J, *et al*. Neural Controlled Differential Equations for Irregular Time Series. *arXiv* 2005.08926 (2020). [[arXiv]](https://arxiv.org/abs/2005.08926)\n\n1. Che, Z, Purushotham, S, Cho, K, *et al*. Recurrent Neural Networks for Multivariate Time Series with Missing Values. *Sci Rep* 8, 6085 (2018). [[doi]](https://doi.org/10.1038/s41598-018-24271-9)\n\n1. Silva, I, Moody, G, Scott, DJ, *et al*. Predicting In-Hospital Mortality of ICU Patients: The PhysioNet/Computing in Cardiology Challenge 2012. *Comput Cardiol* 2012;39:245-248 (2010). [[hdl]](http://hdl.handle.net/1721.1/93166)\n\n1. Reyna, M, Josef, C, Jeter, R, *et al*. Early Prediction of Sepsis From Clinical Data: The PhysioNet/Computing in Cardiology Challenge. *Critical Care Medicine* 48 2: 210-217 (2019). [[doi]](https://doi.org/10.1097/CCM.0000000000004145)\n\n1. Reyna, M, Josef, C, Jeter, R, *et al*. Early Prediction of Sepsis from Clinical Data: The PhysioNet/Computing in Cardiology Challenge 2019 (version 1.0.0). *PhysioNet* (2019). [[doi]](https://doi.org/10.13026/v64v-d857)\n\n1. Goldberger, A, Amaral, L, Glass, L, *et al*. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. *Circulation* 101 (23), pp. e215–e220 (2000). [[doi]](https://doi.org/10.1161/01.cir.101.23.e215)\n\n1. Löning, M, Bagnall, A, Ganesh, S, *et al*. sktime: A Unified Interface for Machine Learning with Time Series. *Workshop on Systems for ML at NeurIPS 2019* (2019). [[doi]](https://doi.org/10.5281/zenodo.3970852)\n\n1. Löning, M, Bagnall, A, Middlehurst, M, *et al*. alan-turing-institute/sktime: v0.10.1 (v0.10.1). *Zenodo* (2022). [[doi]](https://doi.org/10.5281/zenodo.6191159)\n\n## License\n\nReleased under the MIT license.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphilipdarke%2Ftorchtime","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fphilipdarke%2Ftorchtime","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphilipdarke%2Ftorchtime/lists"}