{"id":13400613,"url":"https://github.com/exaexa/scattermore","last_synced_at":"2025-04-09T09:06:22.741Z","repository":{"id":37630491,"uuid":"200534033","full_name":"exaexa/scattermore","owner":"exaexa","description":"very fast scatterplots for R","archived":false,"fork":false,"pushed_at":"2024-01-12T19:58:59.000Z","size":65236,"stargazers_count":233,"open_issues_count":5,"forks_count":6,"subscribers_count":8,"default_branch":"master","last_synced_at":"2024-07-31T19:25:33.560Z","etag":null,"topics":["performance","plot","r","scatterplot","visualization"],"latest_commit_sha":null,"homepage":"https://exaexa.github.io/scattermore/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/exaexa.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2019-08-04T19:37:14.000Z","updated_at":"2024-07-31T10:52:41.000Z","dependencies_parsed_at":"2024-01-17T17:57:28.274Z","dependency_job_id":null,"html_url":"https://github.com/exaexa/scattermore","commit_stats":null,"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/exaexa%2Fscattermore","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/exaexa%2Fscattermore/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/exaexa%2Fscattermore/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/exaexa%2Fscattermore/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/exaexa","download_url":"https://codeload.github.com/exaexa/scattermore/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248008629,"owners_count":21032556,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["performance","plot","r","scatterplot","visualization"],"created_at":"2024-07-30T19:00:53.915Z","updated_at":"2025-04-09T09:06:22.722Z","avatar_url":"https://github.com/exaexa.png","language":"R","funding_links":[],"categories":["Plot layers","R"],"sub_categories":[],"readme":"\n# scattermore\n\n[![CRAN status](https://www.r-pkg.org/badges/version/scattermore)](https://cran.r-project.org/package=scattermore)\n[![CRAN downloads](https://cranlogs.r-pkg.org/badges/scattermore)](https://cran.r-project.org/package=scattermore)\n![R CMD check status](https://github.com/exaexa/scattermore/workflows/R-CMD-check/badge.svg)\n[![codecov](https://codecov.io/gh/exaexa/scattermore/branch/master/graph/badge.svg?token=L1YIYBSLTW)](https://codecov.io/gh/exaexa/scattermore)\n\nScatterplots with more datapoints. If you want to plot bazillions of points\nwithout much waiting, use this.\n\nIf you want to report the usage of scattermore within a scientific project, you\nmay want to refer to it systematically -- scattermore has been peer-reviewed as\na part of a larger package for interactive cytometry data analysis\n([ShinySOM](https://gitlab.com/exaexa/ShinySOM)). Links here:\n[PubMed 32049322](https://pubmed.ncbi.nlm.nih.gov/32049322/),\n[OUP](https://academic.oup.com/bioinformatics/article/36/10/3288/5734646?login=true),\n[doi:10.1093/bioinformatics/btaa091](https://doi.org/10.1093/bioinformatics/btaa091)\n\n## Installation\n\n- from CRAN repositories (recommended): `install.packages('scattermore')`\n- from GitHub (development version: `devtools::install_github('exaexa/scattermore')`\n\n## Quick How-To\n\nFunction `scattermoreplot` is meant to behave roughly like the standard `plot`:\n```r\nlibrary(scattermore)\nscattermoreplot(rnorm(1e7),\n                rnorm(1e7),\n\t\tcol=heat.colors(1e7, alpha=.1),\n\t\tmain='Scattermore demo')\n```\n\nIf you use `ggplot2`, you can use `geom_scattermore` instead of `geom_point` to\nrasterize the graphics (e.g. to reduce PDF size):\n\n```r\nggplot(....) + geom_scattermore()\n```\n\n(Note that the processing of data in ggplot is usually too slow itself; use\n`geom_scattermost` to dodge that.)\n\n## Advanced usage\n\nFunction `scattermore` only creates the raster graphics for the plots; this can\nbe plotted out afterwards (or processed in any other weird ways). Let's try a\nmanual benchmark:\n\n```r\n# create 10 million 2D datapoints\ndata \u003c- cbind(rnorm(1e7),rnorm(1e7))\n\n# prepare empty plot\npar(mar=rep(0,4))\n\n# plot the datapoints and see how long it takes\nsystem.time(plot(scattermore(data, rgba=c(64,128,192,10), xlim=c(-3,3), ylim=c(-3,3))))\n\n   user  system elapsed\n  0.413   0.044   0.461\n```\n\nYou should immediately see _quite a heap_ of tiny points:\n\n![Resulting scatterplot](media/result.png \"Scatterplot\")\n\nNow, how fast would the standard `plot()` do?\n\n```r\n# compare with the usual plot function on x11/cairo\nsystem.time(plot(data, pch='.', xlim=c(-3,3), ylim=c(-3,3), col=rgb(0.25,0.5,0.75,0.04)))\n\n   user  system elapsed\n  9.752   0.023   9.794\n```\n\nThis way, 0.46 seconds of `scattermore` means a nice ~20x speedup over `plot`\non my laptop. Moreover, if you use different plotting setups (basically any\nnon-Cairo, say windows- or quartz-based `grDevices` backends), you will very\npossibly see much greater speedups. Cairo itself is sometimes more than 10x\nfaster than the other backends. That means scattermore may be over 200x faster\nin total.\n\n## How does it work?\n\n1. Points and colors get converted to vectors and passed to C\n2. C code rasterizes the whole thing to a prepared bitmap. This is already\n   quite fast, but some low-level optimization can probably speed it up several\n   more times. Volunteers/pull requests welcome. (Is there a way to push a raw\n   `uint8_t` array into C from R?)\n3. The resulting array gets converted to R raster using `as.raster`, which can\n   get plotted. (Fun fact: When plotting less than roughly 1 million points,\n   most computational time is spent only by this conversion!)\n\n## How fast is it?\n\nLet us measure the same example as above, with points limited to different\nsizes (i.e. in the first case, scattermore receives `data[1:1e4,]`):\n\n```\npoints  .  average time (s)\n--------+------------------\n1e4     .  0.037\n3e4     .  0.039\n1e5     .  0.042\n3e5     .  0.051\n1e6     .  0.076     -- ~50% of the time is R raster conversion overhead\n3e6     .  0.170     -- caches start to overflow here\n1e7     .  0.460\n```\n\n(Multicolor plotting is slightly slower (usually 2x), because the reading and\ntransporting of the relatively large color matrix eats quite a lot of cache.)\n\n## How nice is it?\n\nCustom rasterization gives a bit of extra features. These are the two most\nobvious:\n\n1. The gazillions of points are present as a raster, even in vector output.\n   That might be a problem sometimes (remember to use sufficient raster size to\n   get the desired DPI!), but makes vector output smaller and much more easily\n   processed by other tools. (Remember the huge PDFs with scatterplots that\n   take a minute to load?)\n2. The rasterization is not required to work in limited memory as in usual\n   plotting libraries, which we use to gain a bit of extra precision in color\n   mixing. This is most visible when plotting a ton of low-alpha points where\n   the usual blending methods produce ugly rounding artifacts.\n\n```r\nlibrary(ggplot2)\nlibrary(scattermore)\n\n# data\nd \u003c- cbind(rnorm(1e6),runif(1e6))\n\n# first plot (geom_point)\nggsave('point.png', units='in', width=3, height=3,\n  ggplot(data.frame(x=d[,1],y=d[,2])) +\n  geom_point(shape='.', alpha=.05, aes(x,y,color=y)) +\n  scale_color_viridis_c(guide='none') +\n  ggtitle(\"geom_point\"))\n\n# second plot (geom_scattermost)\nggsave('scattermore.png', units='in', width=3, height=3,\n  ggplot() +\n  geom_scattermost(\n    d,\n    col=viridisLite::viridis(100, alpha=0.05)[1+99*d[,2]],\n    pointsize=2,\n    pixels=c(700,700)) +\n  ggtitle(\"geom_scattermost\"))\n```\n\n\u003cimg alt=\"Plot with geom_point\" src=\"/media/point.png\" width=\"50%\"\u003e\u003cimg alt=\"Plot with geom_scattermore\" src=\"/media/scattermore.png\" width=\"50%\"\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fexaexa%2Fscattermore","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fexaexa%2Fscattermore","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fexaexa%2Fscattermore/lists"}