{"id":16858893,"url":"https://github.com/titu1994/dtw-numba","last_synced_at":"2025-04-11T07:50:18.815Z","repository":{"id":150946707,"uuid":"161567567","full_name":"titu1994/dtw-numba","owner":"titu1994","description":"Implementation of Dynamic Time Warping algorithm with speed improvements based on Numba.","archived":false,"fork":false,"pushed_at":"2019-01-14T01:45:19.000Z","size":738,"stargazers_count":16,"open_issues_count":1,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-25T05:34:08.965Z","etag":null,"topics":["dtw","numba","timeseries","warping"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/titu1994.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-12-13T01:35:48.000Z","updated_at":"2024-09-16T23:01:36.000Z","dependencies_parsed_at":null,"dependency_job_id":"131e5633-f991-4c5a-9c43-eabad01c1267","html_url":"https://github.com/titu1994/dtw-numba","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/titu1994%2Fdtw-numba","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/titu1994%2Fdtw-numba/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/titu1994%2Fdtw-numba/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/titu1994%2Fdtw-numba/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/titu1994","download_url":"https://codeload.github.com/titu1994/dtw-numba/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248359817,"owners_count":21090594,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dtw","numba","timeseries","warping"],"created_at":"2024-10-13T14:15:30.155Z","updated_at":"2025-04-11T07:50:18.779Z","avatar_url":"https://github.com/titu1994.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Dynamic Time Warping in Python using Numba\n\nImplementation of Dynamic Time Warping algorithm with speed improvements based on Numba.\n\nSupports for K nearest neighbours classifier using Dynamic Time Warping, based on the [work presented by Mark Regan](https://github.com/markdregan/K-Nearest-Neighbors-with-Dynamic-Time-Warping). The classes called `KnnDTW` are obtained from there, as a simplified interface akin to Scikit-Learn.\n\nThanks to [Sam Harford](https://github.com/sharford5) for providing the core of the DTW computation.\n\nThanks to [Jonas Klemming](https://github.com/klon) for the C implementation of `ucrdtw`.\n\n\n## Dynamic Time Warping Variants\n-----\nThe three variants available are in `dtw.py`, `odtw.py` and `ucrdtw.py`.\n\n- `dtw.py`: Single threaded variant, support for visualizing the progress bar.\n- `odtw.py`: Multi threaded variant, no support for visualization. In practice, much more effiecient.\n- `ucrdtw.py`: **Experimental (Do not use)**. Multi threaded variant, no support for visualization. It is based upon the optimized C implementation available at https://github.com/klon/ucrdtw.\n\n`odtw.py` is further optimized to run on entire datasets in parallel, and therefore is preferred for any task involving classification.\n\n`ucrdtw.py` is a highly efficient alternative to `odtw.py`, which provides the ability to select warping window and online z-normalization of the dataset. Currently, it is not as performant as the optimized C version, and the original codebase should be used instead.\n\n - **NOTE**: Due to an inefficient implementation, the `ucrdtw` implementation is much slower than `odtw` for certain datasets. However, for warping window less than 100%, it often surpasses `odtw`. To keep scores from evaluations equivalent, all evaluated results will be done with an infinite warping window (the entire length of the query series).\n\n## Speed optimizations\n-----\nWhile [Numba](http://numba.pydata.org/) supports pure python code as input to be compiled, it benefits from C-level micro-optimizations. Considering the runtime complexity of DTW, the `dtw_distance` method in `odtw.py` is a more efficient DTW computation implementation in Numba, which disregards python syntax for C-level optimizations.\n\nSome optimizations shared by both include : \n\n- Empty allocation of `E` : avoids filling with 0s.\n- Inlined `max` operation to avoid depending on python `max` function.\n\nOptimizations available to `odtw.py` : \n\n- Remove calls to np.square() and compute difference and square manually to avoid additional function calls.\n- Parallelize computation of distance matrix over two datasets.\n\nOptimizations available to `ucrdtw.py` :\n\n- Computation of lower bounds to exit DTW computation early\n- Compiled data structures and operations on Deque for fast computation of lower bounds\n- Compiled reverse sorting via Quicksort, falling back to reverse insertion sort for small subsets.\n- Compiled ops for minimum, maximum, absolute value and square distance\n- Cached allocations compared to the C version\n- Choice for online z-normalization\n- Caching of sorted query-index pair for faster evaluation of Index results in parallel\n\n# Evaluations against UCR Archive\nTo ensure that the performance of the two DTW models is exactly the same as that of the DTW scores available in the [UCR Archive](https://www.cs.ucr.edu/~eamonn/time_series_data_2018/), provided is the `Adiac` dataset, which is loaded, z-normalized, then used for evaluation. All three DTW implementations obtain same scores at 100% warping window.\n\n```\nTest Accuracy : 0.6035805626598465\nTest Error : 0.3964194373401535\n```\n\nThese scores match those in the above repository for DTW (w=100).\n\n# Speed Comparison\nComparisons were made against an Intel i7-6700HQ CPU @ 2.60 GHz (8 Logical CPUs, 4 Physical CPUs), with 16 GB of RAM on an Alienware R2 (2015) laptop. Tests were performed on the Adiac dataset, which contains 390 train samples, 391 test samples and each sample is a univariate time series of length 176 timesteps.\n\n## Sample level test\nHere, we compare the time taken to compute the DTW distance between the first train and test samples of the Adiac dataset. \n\nOutput : \n```\nNon Numba Optimized time :  0.12019050598144532\nSample optimized time :  8.00013542175293e-05\nDataset optimized time :  0.0003000330924987793\nUCR optimized time :  0.0005000114440917968\n\nNon Optimized dist :  1.1218082709896633\nSample Optimized dist :  1.1218082709896633\nDataset Optimized dist :  1.1218082709896633\nUCR Optimized dist :  1.1218082709896633\n\nMSE (non optimized - sample optimized):  0.0\nMSE (non optimized - dataset optimized):  0.0\nMSE (non optimized - ucr optimized):  0.0\n```\n\nKey observations are : \n\n- Non-Numba optimized code is several orders of magnitude slower than the sample or dataset optimized variants.\n- Dataset optimized method is slightly slower than the sample variant. This is because the cost incurred with initializing and running subprocesses for a single sample is greater than the parallelization benefit of the underlying optimizations.\n- MSE between the non optimized variant and sample or dataset optimized variant is 0. Therefore this speed does not come at the cost of accuracy.\n\n## Dataset level test\nHere, we compute the time taken to compute the DTW distance matrix between the entire train set (390, 176) against the entire test set (391, 176). This yields a distance matrix of shape [390, 391].\n\nOutput : \n```\nNon Numba Optimized time :  8386.9442625578\nUCR optimized time :  8.214709758758545\nSample optimized time :  13.303221225738525\nDataset optimized time :  3.0960452556610107\n\nNon Optimized dist mean :  0.9556927603445304\nSample Optimized mean dist :  0.9556927603445304\nDataset Optimized mean dist :  0.9556927603445304\nUCR Optimized mean dist :  0.9556927652664071\n\nMSE (non optimized - sample optimized):  0.0\nMSE (non optimized - dataset optimized):  0.0\nMSE (dataset optimized - ucr optimized):  1.2700471157257482e-15\n\n```\n\n## Summary\n\n| Time in seconds | Non Optimized | Sample Optimized | UCR Optimized | Dataset Optimized |\n|-----------------|:-------------:|:----------------:|:-------------:|:-----------------:|\n| Single Sample   |    0.12019    |     8.01e-05     |    5.00e-4    |      3.00e-4      |\n| Full Dataset    |   \u003e 30 mins   |      13.3015     |     8.2147    |       3.096       |\n\nKey observations are : \n\n- Non-Numba optimized code is several orders of magnitude slower than both of the optimized variants, so much so that it is not feasible.\n- Dataset optimized method is several times faster than the sample optimized variant. Scaling is sub-linear, considering that an optimal scaled version should take 1/8-th the time of the sample variant, however it is still benefitial for longer time series (or larger dataset).\n- MSE between the non optimized variant and the sample or dataset optimized variants is 0 once again.\n\n# Requirements\n\n- Numba (use `pip install numba` or `conda install numba`)\n- Numpy\n- Scipy\n- Scikit-learn\n- Pandas (to load UCR datasets)\n- joblib (to extract UCR datasets)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftitu1994%2Fdtw-numba","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftitu1994%2Fdtw-numba","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftitu1994%2Fdtw-numba/lists"}