https://github.com/hadley/bigvis
Exploratory data analysis for large datasets (10-100 million observations)
https://github.com/hadley/bigvis
Last synced: 3 months ago
JSON representation
Exploratory data analysis for large datasets (10-100 million observations)
- Host: GitHub
- URL: https://github.com/hadley/bigvis
- Owner: hadley
- Created: 2012-01-06T09:39:22.000Z (about 14 years ago)
- Default Branch: master
- Last Pushed: 2015-06-29T10:37:48.000Z (over 10 years ago)
- Last Synced: 2025-05-09T00:09:12.895Z (8 months ago)
- Language: C++
- Homepage:
- Size: 2.11 MB
- Stars: 290
- Watchers: 44
- Forks: 40
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# bigvis
[](https://travis-ci.org/hadley/bigvis)
[](https://codecov.io/github/hadley/bigvis?branch=master)
The bigvis package provides tools for exploratory data analysis of __large datasets__ (10-100 million obs). The aim is to have most operations take less than 5 seconds on commodity hardware, even for 100,000,000 data points.
Since bigvis is not currently available on CRAN, the easiest way to try it out is to:
```R
# install.packages("devtools")
devtools::install_github("hadley/bigvis")
```
## Workflow
The bigvis package is structured around the following workflow:
* `bin()` and `condense()` to get a compact summary of the data
* if the estimates are rough, you might want to `smooth()`. See `best_h()` and `rmse_cvs()` to figure out a good starting bandwidth
* if you're working with counts, you might want to `standardise()`
* visualise the results with `autoplot()` (you'll need to load `ggplot2` to use this)
## Weighted statistics
Bigvis also provides a number of standard statistics efficiently implemented on weighted/binned data: `weighted.median`, `weighted.IQR`, `weighted.var`, `weighted.sd`, `weighted.ecdf` and `weighted.quantile`.
## Acknowledgements
This package wouldn't be possible without:
* the fantastic [Rcpp](http://dirk.eddelbuettel.com/code/rcpp.html) package, which makes it amazingly easy to integrate R and C++
* JJ Allaire and Carlos Scheidegger who have indefatigably answered my many C++ questions
* the generous support of Revolution Analytics who supported the early development.
* Yue Hu, who implemented a proof of concepts that showed that it might be possible to work with this much data in R.