{"id":17606537,"url":"https://github.com/mhahsler/stream","last_synced_at":"2026-03-07T06:02:30.934Z","repository":{"id":46215317,"uuid":"45126788","full_name":"mhahsler/stream","owner":"mhahsler","description":"A framework for data stream modeling and associated data mining tasks such as clustering and classification. - R Package","archived":false,"fork":false,"pushed_at":"2025-03-12T18:09:28.000Z","size":4283,"stargazers_count":40,"open_issues_count":1,"forks_count":8,"subscribers_count":4,"default_branch":"master","last_synced_at":"2026-01-26T11:51:17.584Z","etag":null,"topics":["data-stream-clustering","datastream","stream-mining"],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mhahsler.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2015-10-28T16:37:22.000Z","updated_at":"2025-12-06T15:41:20.000Z","dependencies_parsed_at":"2025-05-26T02:09:14.702Z","dependency_job_id":"3bdc7fa9-0de9-45ea-a387-165b019a61b9","html_url":"https://github.com/mhahsler/stream","commit_stats":{"total_commits":168,"total_committers":6,"mean_commits":28.0,"dds":0.06547619047619047,"last_synced_commit":"0c83157635b48adb9a3401de601f81680d844968"},"previous_names":[],"tags_count":12,"template":false,"template_full_name":null,"purl":"pkg:github/mhahsler/stream","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mhahsler%2Fstream","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mhahsler%2Fstream/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mhahsler%2Fstream/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mhahsler%2Fstream/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mhahsler","download_url":"https://codeload.github.com/mhahsler/stream/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mhahsler%2Fstream/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30208801,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-07T05:23:27.321Z","status":"ssl_error","status_checked_at":"2026-03-07T05:00:17.256Z","response_time":53,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-stream-clustering","datastream","stream-mining"],"created_at":"2024-10-22T15:44:46.707Z","updated_at":"2026-03-07T06:02:30.892Z","avatar_url":"https://github.com/mhahsler.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n```{r echo=FALSE, results = 'asis'}\npkg \u003c- 'stream'\n\nsource(\"https://raw.githubusercontent.com/mhahsler/pkg_helpers/main/pkg_helpers.R\")\npkg_title(pkg)\n```\n\n## Introduction\n\nThe package provides support for modeling and simulating data streams as well as an extensible framework for implementing, interfacing and\nexperimenting with algorithms for various data stream mining tasks. The main advantage of stream is that it seamlessly integrates with the large existing infrastructure provided by R. The package provides:\n\n* **Stream Sources:** streaming from files, databases, in-memory data, URLs, pipes, \n  socket connections and several data stream generators including\n  dynamically streams with concept drift.\n* **Stream Processing** with filters (convolution, scaling, exponential moving average, ...)\n* **Stream Aggregation:** sampling, windowing. \n* **Stream Clustering:** **BICO**, **BIRCH**, **D-Stream**, **DBSTREAM**, and **evoStream**.\n* **Stream Outlier Detection** based on **D-Stream**, **DBSTREAM**.\n* **Stream Classification** with **DecisionStumps**, **HoeffdingTree**, **NaiveBayes**\n    and **Ensembles** (streamMOA via RMOA).\n* **Stream Regression** with **Perceptron**, **FIMTDD**, **ORTO**, ... (streamMOA via RMOA).\n* **Stream Mining Evaluation** with prequential error estimation.\n\nAdditional packages in the stream family are:\n\n* [streamConnect](https://github.com/mhahsler/streamConnect): Connect stream mining \n    components using sockets and web services.\n* [streamMOA](https://github.com/mhahsler/streamMOA): Interface to clustering\n  algorithms implemented in the [MOA](https://moa.cms.waikato.ac.nz/) framework.\n  The package interfaces clustering algorithms like of **DenStream**, **ClusTree**, \n  **CluStream** and **MCOD**.\n  The package also provides an interface to [RMOA](https://github.com/jwijffels/RMOA) for\n  MOA's stream classifiers and stream regression models.\n* [rEMM](https://github.com/mhahsler/rEMM): Provides implementations of \n  **threshold nearest neighbor clustering** (tNN) and\n  **Extensible Markov Model** (EMM)  for modelling temporal relationships between clusters.\n\n\n```{r echo=FALSE, results = 'asis'}\npkg_citation(pkg, 2)\npkg_install(pkg)\n```\n\n## Usage\n\n```{r echo=FALSE}\noptions(digits = 3)\n```\n\nLoad the package and a random data stream with 3 Gaussian clusters and 10\\% noise and scale the data to z-scores.\n\n```{r stream}\nlibrary(\"stream\")\nset.seed(2000)\n\nstream \u003c- DSD_Gaussians(k = 3, d = 2, noise= .1) %\u003e% DSF_Scale()\nget_points(stream, n = 5)\n\nplot(stream)\n```\n\nCluster a stream of 1000 points using D-Stream which estimates point density in grid cells.\n\n```{r Dstream}\ndsc \u003c- DSC_DStream(gridsize = .1)\nupdate(dsc, stream, 1000)\nplot(dsc, stream, grid = TRUE)\n```\n\n```{r Dstream_eval}\nevaluate_static(dsc, stream, n = 100)\n```\n\n\nOutlier detection using DBSTREAM which uses micro-clusters with a given radius.\n\n```{r DSOutlier_DBSTREAM}\ndso \u003c- DSOutlier_DBSTREAM(r = .1)\nupdate(dso, stream, 1000)\nplot(dso, stream)\n```\n\n```{r DSO_eval}\nevaluate_static(dso, stream, n = 100, measure = c(\"numPoints\", \"noiseActual\", \"noisePredicted\", \"noisePrecision\"))\n```\n\nPreparing complete stream process pipelines that can be run using a single `update()` call.\n```{r pipeline}\npipeline \u003c- DSD_Gaussians(k = 3, d = 2, noise= .1) %\u003e% \n  DSF_Scale() %\u003e% \n  DST_Runner(DSC_DStream(gridsize = .1))\npipeline\n\nupdate(pipeline, n = 500)\npipeline$dst\n```\n\n\n## Acknowledgments\n\nThe development of the stream package was supported in part by NSF IIS-0948893, NSF CMMI 1728612, and NIH R21HG005912.\n\n## References\n\n* Michael Hahsler, Matthew Bolaños, and John Forrest. [stream: An extensible framework for data stream clustering research with R.](https://dx.doi.org/10.18637/jss.v076.i14) _Journal of Statistical Software,_ 76(14), February 2017.\n* [stream package vignette](https://cran.r-project.org/package=stream/vignettes/stream.pdf) with complete examples.\n* [stream reference manual](https://cran.r-project.org/package=stream/stream.pdf)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmhahsler%2Fstream","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmhahsler%2Fstream","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmhahsler%2Fstream/lists"}