{"id":17606521,"url":"https://github.com/mhahsler/remm","last_synced_at":"2025-04-30T12:18:56.126Z","repository":{"id":40594185,"uuid":"421508771","full_name":"mhahsler/rEMM","owner":"mhahsler","description":null,"archived":false,"fork":false,"pushed_at":"2024-08-27T22:24:16.000Z","size":431,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-09T22:51:07.708Z","etag":null,"topics":["clustering","data-stream","sequence-analysis"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mhahsler.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-10-26T16:49:29.000Z","updated_at":"2025-02-04T18:03:16.000Z","dependencies_parsed_at":"2024-08-27T23:41:47.447Z","dependency_job_id":"b48b454c-f951-46b5-a00d-522b2e485e5a","html_url":"https://github.com/mhahsler/rEMM","commit_stats":{"total_commits":22,"total_committers":2,"mean_commits":11.0,"dds":"0.045454545454545414","last_synced_commit":"df9a639080a2b91bfdbe5a7a531d14c55f2c4f41"},"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mhahsler%2FrEMM","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mhahsler%2FrEMM/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mhahsler%2FrEMM/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mhahsler%2FrEMM/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mhahsler","download_url":"https://codeload.github.com/mhahsler/rEMM/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251699861,"owners_count":21629826,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clustering","data-stream","sequence-analysis"],"created_at":"2024-10-22T15:44:44.593Z","updated_at":"2025-04-30T12:18:56.046Z","avatar_url":"https://github.com/mhahsler.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n```{r echo=FALSE, results = 'asis'}\npkg \u003c- 'rEMM'\n\nsource(\"https://raw.githubusercontent.com/mhahsler/pkg_helpers/main/pkg_helpers.R\")\npkg_title(pkg)\n```\n\n\nImplements TRACDS (Temporal Relationships \n    between Clusters for Data Streams), a generalization of \n    Extensible Markov Model (EMM), \n    to model transition probabilities in sequence data. TRACDS adds a temporal or order model\n    to data stream clustering by superimposing a dynamically adapting\n    Markov Chain. Also provides an implementation of EMM (TRACDS on top of tNN \n    data stream clustering). \n\nInterface classes  DSC_tNN and DSC_EMM for the [stream package](https://github.com/mhahsler/stream) are provided.  \n\n\n```{r echo=FALSE, results = 'asis'}\npkg_citation(pkg, 2L)\npkg_install(pkg)\n```\n\n## Usage\n\nWe use a artificial dataset with a mixture of four clusters components. Points are generated using a fixed sequence \n\u003c1,2,1,3,4\u003e through the four clusters. The lines below indicate the sequence.\n\n```{r example_data}\nlibrary(rEMM)\n\ndata(\"EMMsim\")\n\nplot(EMMsim_train, pch = NA)\nlines(EMMsim_train, col = \"gray\")\npoints(EMMsim_train, pch = EMMsim_sequence_train)\n```\n\nEMM recovers the components and the sequence information. We use EMM and then recluster the found structure assuming\nthat we know that there are 4 components. The graph below represents a Markov model of the found sequence.\n\n```{r example_model}\nemm \u003c- EMM(threshold = 0.1, measure = \"euclidean\")\nbuild(emm, EMMsim_train)\nemmc \u003c- recluster_hclust(emm, k = 4, method = \"average\")\nplot(emmc)\n```\n\nWe can now score new sequences (we use a test sequence created in the same way as the training data) by calculating the product the transition probabilities in the model. The high score indicates this.\n\n```{r}\nscore(emmc, EMMsim_test)\n```\n\n# References\n* Michael Hahsler and Margaret H. Dunham. \n  [rEMM: Extensible Markov model for data stream clustering in R.](http://dx.doi.org/10.18637/jss.v035.i05) \n  _Journal of Statistical Software,_ 35(5):1-31, 2010.\n* Michael Hahsler and Margaret H. Dunham. \n  [Temporal structure learning for clustering massive data \n    streams in real-time](https://doi.org/10.1137/1.9781611972818.57). \n  In _SIAM Conference on Data Mining (SDM11),_ pages 664--675. SIAM, April 2011.\n\n# Acknowledgements\n    \nDevelopment of this  package was supported in part by NSF IIS-0948893 and R21HG005912 from\nthe National Human Genome Research Institute.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmhahsler%2Fremm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmhahsler%2Fremm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmhahsler%2Fremm/lists"}