{"id":17206523,"url":"https://github.com/sylvaincom/astride","last_synced_at":"2025-04-13T22:09:48.238Z","repository":{"id":66213264,"uuid":"601377264","full_name":"sylvaincom/astride","owner":"sylvaincom","description":"[EUSIPCO 2024] Python implementation of \"ASTRIDE: Adaptive Symbolization for Time Series Databases\"","archived":false,"fork":false,"pushed_at":"2024-11-23T15:09:13.000Z","size":6035,"stargazers_count":17,"open_issues_count":0,"forks_count":3,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-13T22:09:42.623Z","etag":null,"topics":["distance","signal-processing","signal-reconstruction","time-series","time-series-classification"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sylvaincom.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-02-13T23:48:56.000Z","updated_at":"2025-03-18T11:08:39.000Z","dependencies_parsed_at":"2024-02-27T12:40:27.287Z","dependency_job_id":null,"html_url":"https://github.com/sylvaincom/astride","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sylvaincom%2Fastride","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sylvaincom%2Fastride/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sylvaincom%2Fastride/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sylvaincom%2Fastride/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sylvaincom","download_url":"https://codeload.github.com/sylvaincom/astride/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248788933,"owners_count":21161727,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["distance","signal-processing","signal-reconstruction","time-series","time-series-classification"],"created_at":"2024-10-15T02:28:53.775Z","updated_at":"2025-04-13T22:09:48.212Z","avatar_url":"https://github.com/sylvaincom.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ASTRIDE: Adaptive Symbolization for Time Series Databases\n\nThis repository contains the code to reproduce all experiments in our preprint paper\n\u003eASTRIDE: Adaptive Symbolization for Time Series Databases\navailable on [arXiv](https://arxiv.org/abs/2302.04097).\n\nAll the code is written in Python (scripts and notebooks).\n\nNote that the ASTRIDE method has been accepted at [EUSIPCO 2024](https://ieeexplore.ieee.org/document/10715214):\n\u003e S. W. Combettes, C. Truong, and L. Oudre. \"Symbolic representation for time series.\" In _Proceedings of the European Signal Processing Conference (EUSIPCO)_, Lyon, France, 2024.\n\n\u003cdetails\u003e\u003csummary\u003e\u003ci\u003eToggle for the paper's abstract!\u003c/i\u003e\u003c/summary\u003eWe introduce ASTRIDE (Adaptive Symbolization for Time seRIes DatabasEs), a novel symbolic representation of time series, along with its accelerated variant FASTRIDE (Fast ASTRIDE). Unlike most symbolization procedures, ASTRIDE is adaptive during both the segmentation step by performing change-point detection and the quantization step by using quantiles. Instead of proceeding signal by signal, ASTRIDE builds a dictionary of symbols that is common to all signals in a data set. We also introduce D-GED (Dynamic General Edit Distance), a novel similarity measure on symbolic representations based on the general edit distance. We demonstrate the performance of the ASTRIDE and FASTRIDE representations compared to SAX (Symbolic Aggregate approXimation), 1d-SAX, SFA (Symbolic Fourier Approximation), and ABBA (Adaptive Brownian Bridge-based Aggregation) on reconstruction and, when applicable, on classification tasks. These algorithms are evaluated on 86 univariate equal-size data sets from the UCR Time Series Classification Archive. An open source GitHub repository called astride is made available to reproduce all the experiments in Python.\u003c/details\u003e\u003c/br\u003e\n\nPlease let us know of any issue you might encounter when using this code, either by opening an issue on this repository or by sending an email to `sylvain.combettes8 [at] gmail.com`. Or if you just need some clarification or help.\n\n## How is a symbolic representation implemented?\n\nFor ASTRIDE and FASTRIDE, a symbolic representation (with an associated distance) is a scikit-learn pipeline based on the following classes in the `src` folder:\n1. `SegmentFeature` (in `segmentation.py`)\n1. `Segmentation` (in `segment_features.py`)\n1. `Symbolization` (in `symbolization.py`)\n1. `SymbolicSignalDistance` (in `symbolic_signal_distance.py`)\n\nAn example usage is given in `07_run_reconstruction.py`.\n\nDownload the `ABBA/ABBA.py` file from https://github.com/nla-group/ABBA, as it is used in the signal reconstruction benchmark.\n\n## Structure of the code\n\n`date_exp` is a string (for example `\"2023_02_08\"`) in order to version the experiments.\n\nThe code outputs the following kinds of `csv` files:\n1. `data` folder: the UCR archive meta data filtered on the list of data sets under consideration\n1. `results/{date_exp}/acc` folder: the results from the 1-NN classifcation task\n1. `results/{date_exp}/clean_acc` folder: the results from the 1-NN classifcation task\n1. `results/{date_exp}/reconstruction` folder: the results from signal reconstruction task\n\nThe code outputs all the figures in the paper in the `results/{date_exp}/img` folder.\n\nFor the Python notebooks, the configuration parameters (at the top) do the following:\n- `DATE_EXP` sets the date of launch of the experiment (for versioning)\n- if `IS_EXPORT_DF=False`, then no CSV files (pandas DataFrame) are exported\n- if `IS_SAVE_FIG=False`, then no figures are exported\n- if `IS_COMPUTE=False`, then long computations are made (no more than a few mintutes on a standard laptop)\n\n## How to use this repository to reproduce the ASTRIDE paper\n\n_Note that more details are given at the top of each notebook and that the code is commented._\n\n1. Explore the UCR archive.\u003cbr\u003e\n    Run the `01_explore_ucr_metadata.ipynb` notebook.\u003cbr\u003e\n    It generates:\n    - the `data/DataSummary_prep_equalsize.csv` file which contains the 117 univariate and equal-size data sets from the UCR archive.\n    - the `data/DataSummary_prep_equalsize_min100samples.csv` file which contains the 94 univariate and equal-size data sets with at least 100 samples from the UCR archive.\n    - Table 4.\n    - Data for Section 2.3.3 and Section 3.4 about the total memory usage of a symbolization method.\n1. Look into the normality assumption oof the means per segment.\u003cbr\u003e\n    Run the `02_explore_gaussian_assumption.ipynb` notebook.\u003cbr\u003e\n    - It performs the normality test for section 2.3.1.\n    - It generates Figure 2.\n1. Intrepret the symbolization methods, how they transform a signal.\u003cbr\u003e\n    Run the `03_interpret_symbolization.ipynb` notebook\n    - It generates Figures 1, 3 and 4.\n    - It generates Table 3.\n1. For the classification benchmark of SAX, 1d-SAX, ASTRIDE, and FASTRIDE of Section 4.1.\n    1. Compute the test accuracies of each method.\u003cbr\u003e\n        Run the `04_run_classification.py` script for the four symbolization methods:\n        ```\n        $ python3 04_run_classification.py --method_name \"saxtslearn\" --date_exp \"2023_02_08\"\n        $ python3 04_run_classification.py --method_name \"1dsax\" --date_exp \"2023_02_08\"\n        $ python3 04_run_classification.py --method_name \"fastride\" --date_exp \"2023_02_08\"\n        $ python3 04_run_classification.py --method_name \"astride\" --date_exp \"2023_02_08\"\n        ```\n        Note that the `date_exp` variable is for versioning the experiments.\u003cbr\u003e\n        It generates the `results/{date_exp}/acc/df_acc_{method}_{dataset}.csv` files, which are the test accuracies per method per data set and for all combination of hyper-parameters.\n    1. Clean the classification results for each method.\u003cbr\u003e\n        Run the `05_clean_classification_results.ipynb` notebook.\n        It generates the `results/{date_exp}/acc_clean/df_acc_{method}_alldatasets_clean.csv` files, which are the test accuracies per method for all data sets and combination of hyper-parameters, cleaned.\n    1. Explore the results for each method: plot the accuracy as a function of the word length.\u003cbr\u003e\n        Run the `06_explore_classification_results.ipynb` notebook.\u003cbr\u003e\n        It generates Figure 5.\n1. For the signal reconstruction benchmark of SAX, 1d-SAX, SFA, ABBA, ASTRIDE and FASTRIDE of Section 4.2.\n    1. Compute the reconstructed signals for each method and all signals of all data sets as well as the reconstruction error.\u003cbr\u003e\n        Run the `07_run_recontruction.py` script for several target memory usage ratios\n        ```\n        python3 07_run_reconstruction.py --denom 3 --date_exp \"2023_02_08\"\n        python3 07_run_reconstruction.py --denom 4 --date_exp \"2023_02_08\"\n        python3 07_run_reconstruction.py --denom 5 --date_exp \"2023_02_08\"\n        python3 07_run_reconstruction.py --denom 6 --date_exp \"2023_02_08\"\n        python3 07_run_reconstruction.py --denom 10 --date_exp \"2023_02_08\"\n        python3 07_run_reconstruction.py --denom 15 --date_exp \"2023_02_08\"\n        python3 07_run_reconstruction.py --denom 20 --date_exp \"2023_02_08\"\n        ```\n        Note that the `denom` variable is the inverse of the target memory usage ratio.\u003cbr\u003e\n        It generates:\n        - the `results/reconstruction/{denom}/reconstruction_errors_{dataset}.csv` files, which are the\n        - the `results/reconstruction/{denom}/reconstructed_signals/reconstructed_{dataset}_{method}.csv` files, which are the reconstructed signals for each symbolization method.\n    1. Compute the reconstruction errors.\u003cbr\u003e\n        Run the `08_explore_recontruction_results.ipynb` notebook.\u003cbr\u003e\n        It generates Figures 6 and 7.\n    1. Interpret the signal reconstruction for each symbolization method.\u003cbr\u003e\n        Run the `09_interpret_reconstruction.ipynb` notebook.\u003cbr\u003e\n        It generates Figure 8.\n1. Compare the symbolization and classification durations of SAX, 1d-SAX, ASTRIDE, and FASTRIDE.\u003cbr\u003e\n    Run the `10_processing_time.ipynb` notebook.\u003cbr\u003e\n    It generates Table 6.\n\n## Requirements\n\n- loadmydata==0.0.9\n- matplotlib==3.4.1\n- numpy==1.20.2\n- pandas==1.2.4\n- plotly==5.5.0\n- ruptures==1.1.6.dev3+g1def4ff\n- scikit-learn==1.0.1\n- scipy==1.6.2\n- seaborn==0.11.1\n- statsmodels==0.13.1\n- tslearn==0.5.2\n- weighted-levenshtein==0.2.1\n\n## Citing\n\nIf you use this code or publication, please cite (arXiv: https://arxiv.org/abs/2302.04097):\n```bibtex\n@article{2023_combettes_astride,\n    doi = {10.48550/ARXIV.2302.04097},\n    url = {https://arxiv.org/abs/2302.04097},\n    author = {Combettes, Sylvain W. and Truong, Charles and Oudre, Laurent},\n    title = {ASTRIDE: Adaptive Symbolization for Time Series Databases},\n    journal = {arXiv preprint arXiv:2302.04097},\n    year = {2023},\n}\n```\n\n```bibtex\n@INPROCEEDINGS{10715214,\n  author={Combettes, Sylvain W. and Truong, Charles and Oudre, Laurent},\n  booktitle={2024 32nd European Signal Processing Conference (EUSIPCO)}, \n  title={Symbolic Representation for Time Series}, \n  year={2024},\n  pages={1962-1966},\n  doi={10.23919/EUSIPCO63174.2024.10715214}}\n```\n\n## Licence\n\nThis project is licensed under the MIT License, see the `LICENSE.md` file for more information.\n\n## Contributors\n\n* [Sylvain W. Combettes](https://sylvaincom.github.io/) (Centre Borelli, ENS Paris-Saclay)\n* [Charles Truong](https://charles.doffy.net/) (Centre Borelli, ENS Paris-Saclay)\n* [Laurent Oudre](http://www.laurentoudre.fr/) (Centre Borelli, ENS Paris-Saclay)\n\n## Acknowledgments\n\nSylvain W. Combettes is supported by the IDAML chair (ENS Paris-Saclay) and UDOPIA (ANR-20-THIA-0013-01).\nCharles Truong is funded by the PhLAMES chair (ENS Paris-Saclay).\nPart of the computations has been executed on Atos Edge computer, funded by the IDAML chair (ENS Paris-Saclay).\n\n\u003cp align=\"center\"\u003e\n\u003cimg width=\"700\" src=\"https://github.com/boniolp/dsymb-playground/blob/main/figures/cebo_logos.png\"/\u003e\n\u003c/p\u003e","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsylvaincom%2Fastride","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsylvaincom%2Fastride","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsylvaincom%2Fastride/lists"}