https://github.com/elastacloud/automatic-data-explorer
An R package to explore and quality check data
https://github.com/elastacloud/automatic-data-explorer
correlations covariance pca summary-statistics
Last synced: 4 months ago
JSON representation
An R package to explore and quality check data
- Host: GitHub
- URL: https://github.com/elastacloud/automatic-data-explorer
- Owner: elastacloud
- License: mit
- Created: 2017-07-20T13:53:37.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-05-25T08:53:50.000Z (almost 7 years ago)
- Last Synced: 2024-08-13T07:11:35.989Z (8 months ago)
- Topics: correlations, covariance, pca, summary-statistics
- Language: R
- Size: 541 KB
- Stars: 3
- Watchers: 9
- Forks: 3
- Open Issues: 10
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - elastacloud/automatic-data-explorer - An R package to explore and quality check data (R)
README
# Automatic Data Explorer [](https://travis-ci.org/elastacloud/automatic-data-explorer) [](https://codecov.io/gh/elastacloud/automatic-data-explorer)
An R package to explore and quality check data. Contains a variety of useful functions which enable automatic checking of data quality, factors and numeric data as well as correlations.
- `targetCorrletions()`
- `ggdensity()`
- `gghistogram()`
- `SummaryStatsCat()`
- `SummaryStatsNum()`
- `autoMarkdown()`## Using targetCorrelations
To get started use a data frame and detail the column that you want to get target correlations for:
install.packages("purrr")
library(purrr)data <- data.frame(A = rnorm(50,0,1),
B = runif(50,10,20),
C = seq(1,50,1),
D = rep(LETTERS[1:5], 10))targetCorrelations(data, "B")
This should give a similar report to:
C A
0.40549008 0.01356416## Using autoMarkdown
The `autoMarkdown()` function can be used to automatically generate R Markdown files directly from one or more
R scripts. The idea is to take the focus away from thinking about your Markdown styling when doing the
most important part of data science, the actual expoloration and analysis.The function requires that the R script has some formatting; the code that you wish to be incorporated into a
code chunk must be separated with a divider, e.g.#' # Summary
#' This is the summary of the mtcars dataset
#.#
summary(mtcars)
#.#
#' ## Histogram of mpg
#' This is a histogram of the mpg variable
#.#
autoHistogramPlot(mtcars, mpg, colour = "black", fill = "blue")
#.#
There are two things to note in this example
- #.# are the dividers and mean that the code within should be treated as a code chunk
- #' autoMarkdown recognises these as Roxygen comments and treats them accordinglySay that we have saved the above in an R script called `mtcars.R`, we can now write this as R Markdown to an existing
`mtcars.Rmd` file withautoMarkdown("mtcars.R", "mtcars.Rmd")
Most projects will have multiple separate scripts; perhaps detailing different stages of the data science life-cycle.
This makes our work flow much easier to follow and keeps code neat and tidy. However, when it comes to reporting it
is most likely that we want just one report. If we have multiple scripts these can all be written to the same .Rmd
file withautoMarkdown(c("DataExploration.R", "DataCleaning.R", "Modelling.R"), "ProjectReport.Rmd", overwrite = TRUE)
Note the `overwrite = TRUE` argument. This specification will mean that any existing markdown in the .Rmd file will automatically be written over. This is useful in most circumstances but could potentially be dangerous if you specify the
wrong .Rmd file, so use with caution.The default setting is to create code chunks that are "quiet", that is they will only display the results of the code,
not the code itself or any messages generated by it. Further development may include an option to specify a code chunk
that also displays the code itself.