{"id":13472569,"url":"https://github.com/boniolp/kGraph","last_synced_at":"2025-03-26T17:30:54.124Z","repository":{"id":225170833,"uuid":"733416494","full_name":"boniolp/kGraph","owner":"boniolp","description":"Graph Embedding for Interpretable Time Series Clustering","archived":false,"fork":false,"pushed_at":"2025-03-10T10:08:28.000Z","size":51834,"stargazers_count":23,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-10T11:24:32.791Z","etag":null,"topics":["clustering","graph","graph-embedding","graph-representation","interpretability","networkx","python","python3","time-series","time-series-analysis","time-series-clustering"],"latest_commit_sha":null,"homepage":"https://graphit.streamlit.app/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/boniolp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-19T09:29:00.000Z","updated_at":"2025-03-10T10:08:31.000Z","dependencies_parsed_at":"2025-02-20T09:41:53.472Z","dependency_job_id":null,"html_url":"https://github.com/boniolp/kGraph","commit_stats":null,"previous_names":["boniolp/kgraph"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/boniolp%2FkGraph","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/boniolp%2FkGraph/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/boniolp%2FkGraph/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/boniolp%2FkGraph/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/boniolp","download_url":"https://codeload.github.com/boniolp/kGraph/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245702147,"owners_count":20658553,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clustering","graph","graph-embedding","graph-representation","interpretability","networkx","python","python3","time-series","time-series-analysis","time-series-clustering"],"created_at":"2024-07-31T16:00:55.824Z","updated_at":"2025-03-26T17:30:54.117Z","avatar_url":"https://github.com/boniolp.png","language":"Python","readme":"\u003cp align=\"center\"\u003e\n\u003cimg width=\"160\" src=\"./ressources/kGraph_logo.png\"/\u003e\n\u003c/p\u003e\n\n\n\u003ch1 align=\"center\"\u003e$k$-Graph\u003c/h1\u003e\n\u003ch2 align=\"center\"\u003e A Graph Embedding for Interpretable Time Series Clustering\u003c/h2\u003e\n\n\u003cdiv align=\"center\"\u003e\n\u003cp\u003e\n\u003cimg alt=\"PyPI - Downloads\" src=\"https://pepy.tech/badge/kgraph-ts\"\u003e \u003cimg alt=\"PyPI\" src=\"https://img.shields.io/pypi/v/kgraph-ts\"\u003e \u003cimg alt=\"GitHub\" src=\"https://img.shields.io/github/license/boniolp/kgraph\"\u003e \u003cimg alt=\"GitHub issues\" src=\"https://img.shields.io/github/issues/boniolp/kgraph\"\u003e \u003cimg alt=\"PyPI - Python Version\" src=\"https://img.shields.io/pypi/pyversions/kgraph-ts\"\u003e\n\u003c/p\u003e\n\u003c/div\u003e\n\n## $k$-Graph in short\n\n$k$-Graph is an explainable and interpretable Graph-based time series clustering. $k$-Graph is divided into three steps: (i) Graph embedding, (ii) Graph clustering, and (iii) Consensus Clustering. In practice, it first projects the time series into a graph and repeats the operation for multiple pattern lengths. For each pattern length, we use the corresponding graph to cluster time series (based on nodes and edges frequency for each time series). We then find a consensus between all pattern lengths and use the consensus as clustering labels. Thanks to the graph representation of the time series (into a unique graph), $k$-Graph can be used for variable length time series. Moreover, we provide a way to select the most interpretable graph for the resulting clustering partition and allow the user to visualize the subsequences contained in the most represtnative and exclusive nodes.\nAn interactive tool to play with $k$-Graph can be found [here](https://github.com/boniolp/graphint).\n\n\u003cp align=\"center\"\u003e\n\u003cimg width=\"860\" src=\"./ressources/pipeline.png\"/\u003e\n\u003c/p\u003e\n\n## References\n\n$k$-Graph has been accepted for publication IEEE Transactions on Knowledge and Data Engineering (TKDE). You may find the preprint version [here](https://arxiv.org/abs/2502.13049). \nIf you use $k$-Graph in your project or research, cite the following paper:\n\n\u003e P. Boniol, D. Tiano, A. Bonifati and T. Palpanas, \" k -Graph: A Graph Embedding for Interpretable Time Series Clustering,\" in IEEE Transactions on Knowledge and Data Engineering, doi: 10.1109/TKDE.2025.3543946.\n\n```bibtex\n@ARTICLE{10896823,\n  author={Boniol, Paul and Tiano, Donato and Bonifati, Angela and Palpanas, Themis},\n  journal={IEEE Transactions on Knowledge and Data Engineering}, \n  title={$k$-Graph: A Graph Embedding for Interpretable Time Series Clustering}, \n  year={2025},\n  volume={},\n  number={},\n  pages={1-14},\n  keywords={Time series analysis;Feature extraction;Clustering algorithms;Accuracy;Heuristic algorithms;Clustering methods;Training;Shape;Partitioning algorithms;Directed graphs;Time Series;Clustering;Interpretability},\n  doi={10.1109/TKDE.2025.3543946}}\n```\n\n## Contributors\n\n- [Paul Boniol](https://boniolp.github.io/), Inria, ENS, PSL University, CNRS\n- [Donato Tiano](https://liris.cnrs.fr/en/member-page/donato-tiano), Università degli Studi di Modena e Reggio Emilia\n- [Angela Bonifati](https://perso.liris.cnrs.fr/angela.bonifati/), Lyon 1 University, IUF, Liris CNRS\n- [Themis Palpanas](https://helios2.mi.parisdescartes.fr/~themisp/), Université Paris Cité, IUF\n\n\n## Getting started\n\nThe easiest solution to install $k$-Graph is to run the following command:\n\n```(bash) \npip install kgraph-ts\n```\n\nGraphviz and pyGraphviz can be used to obtain better visualisation for $k$-Graph. These two packages are not necessary to run $k$-graph. If not installed, a random layout is used to plot the graphs.\nTo benefit from a better visualisation of the graphs, please install Graphviz and pyGraphviz as follows:\n\n#### For Mac:\n\n```(bash) \nbrew install graphviz\n```\n\n#### For Linux (Ubuntu):\n\n```(bash) \nsudo apt install graphviz\n```\n\n#### For Windows:\n\nStable Windows install packages are listed [here](https://graphviz.org/download/)\n\nOnce Graphviz is installed, you can install pygraphviz as follows:\n\n```(bash) \npip install pygraphviz\n```\n\n\n\n### Manual installation\n\nYou can also install manually $k$-Graph by following the instructions below.\nAll Python packages needed are listed in [requirements.txt](https://github.com/boniolp/kGraph/blob/main/requirements.txt) file and can be installed simply using the pip command: \n\n```(bash) \nconda env create --file environment.yml\nconda activate kgraph\npip install -r requirements.txt\n``` \nYou can then install $k$-Graph locally with the following command:\n\n```(bash) \npip install .\n``` \n\n\n## Usage\n\nIn order to play with $k$-Graph, please check the [UCR archive](https://www.cs.ucr.edu/%7Eeamonn/time_series_data_2018/). We depict below a code snippet demonstrating how to use $k$-Graph.\n\n```python \nimport sys\nimport pandas as pd\nimport numpy as np\nimport networkx as nx\nimport matplotlib.pyplot as plt\nfrom sklearn.metrics import adjusted_rand_score\n\nsys.path.insert(1, './utils/')\nfrom utils import fetch_ucr_dataset\n\nfrom kgraph import kGraph\n\n\npath = \"/Path/to/UCRArchive_2018/\"\ndata = fetch_ucr_dataset('Trace',path)\nX = np.concatenate([data['data_train'],data['data_test']],axis=0)\ny = np.concatenate([data['target_train'],data['target_test']],axis=0)\n\n\n# Executing kGraph\nclf = kGraph(n_clusters=len(set(y)),n_lengths=10,n_jobs=4)\nclf.fit(X)\n\nprint(\"ARI score: \",adjusted_rand_score(clf.labels_,y))\n``` \n```\nRunning kGraph for the following length: [36, 72, 10, 45, 81, 18, 54, 90, 27, 63] \nGraphs computation done! (36.71151804924011 s) \nConsensus done! (0.03878021240234375 s) \nEnsemble clustering done! (0.0060100555419921875 s) \nARI score:  0.986598879940902\n```\n\nFor variable lenght time series datasets, $k$-Graph has to be initialized as follows:\n\n```python\nclf = kGraph(n_clusters=len(set(y)),variable_length=True,n_lengths=10,n_jobs=4)\n``` \n\n### Visualization tools\n\nWe provide visualization methods to plot the graph and the identified clusters (i.e., graphoids). After running $k$-Graph, you can run the following code to plot the graphs partitioned in different clusters (grey are nodes that are not associated with a specific cluster).\n\n```python\nclf.show_graphoids(group=True,save_fig=True,namefile='Trace_kgraph')\n``` \n\u003cp align=\"center\"\u003e\n\u003cimg width=\"800\" src=\"./ressources/Trace_kgraph.jpg\"/\u003e\n\u003c/p\u003e\n\nInstead of visualizing the graph, we can directly retrieve the most representative nodes for each cluster with the following code:\n\n```python \nnb_patterns = 1\n\n#Get the most representative nodes\nnodes = clf.interprete(nb_patterns=nb_patterns)\n\nplt.figure(figsize=(10,4*nb_patterns))\ncount = 0\nfor j in range(nb_patterns):\n\tfor i,node in enumerate(nodes.keys()):\n\n\t\t# Get the time series for the corresponding node\n\t\tmean,sup,inf = clf.get_node_ts(X=X,node=nodes[node][j][0])\n\t\t\n\t\tcount += 1\n\t\tplt.subplot(nb_patterns,len(nodes.keys()),count)\n\t\tplt.fill_between(x=list(range(int(clf.optimal_length))),y1=inf,y2=sup,alpha=0.2) \n\t\tplt.plot(mean,color='black')\n\t\tplt.plot(inf,color='black',alpha=0.6,linestyle='--')\n\t\tplt.plot(sup,color='black',alpha=0.6,linestyle='--')\n\t\tplt.title('node {} for cluster {}: \\n (representativity: {:.3f} \\n exclusivity : {:.3f})'.format(nodes[node][j][0],node,nodes[node][j][3],nodes[node][j][2]))\nplt.tight_layout()\n\nplt.savefig('Trace_cluster_interpretation.jpg')\nplt.close()\n``` \n\u003cp align=\"center\"\u003e\n\u003cimg width=\"800\" src=\"./ressources/Trace_cluster_interpretation.jpg\"/\u003e\n\u003c/p\u003e\n\nYou may find a script containing all the code above [here](https://github.com/boniolp/kGraph/blob/main/examples/scripts/Trace_example.py).\n\n\n\n","funding_links":[],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fboniolp%2FkGraph","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fboniolp%2FkGraph","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fboniolp%2FkGraph/lists"}