Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/superchordate/storyteller
AutoML R framework functions for quickly finding stories from data.
https://github.com/superchordate/storyteller
automl eda exploratory-data-analysis r
Last synced: 3 months ago
JSON representation
AutoML R framework functions for quickly finding stories from data.
- Host: GitHub
- URL: https://github.com/superchordate/storyteller
- Owner: superchordate
- License: gpl-3.0
- Created: 2022-02-09T02:15:43.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-02-21T20:59:07.000Z (9 months ago)
- Last Synced: 2024-06-04T23:02:29.837Z (5 months ago)
- Topics: automl, eda, exploratory-data-analysis, r
- Language: R
- Homepage:
- Size: 80.1 KB
- Stars: 4
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - superchordate/storyteller - AutoML R framework functions for quickly finding stories from data. (R)
README
R functions (package TBD) to quicky find stories in data using data mining models.
These steps are working (ish):
* clean: Clean the data. Right now this just means replacing Inf with NA. Will expand this over time.
* dropnoisecols: Remove columns that cannot easily be analyzed or do not contribute information.
* groupother: Group small groups.
* dropoutliers: Remove outliers.
* find_correlated_features: compare features to identify those that are correlated.
* Method to easily plot correlated features.
* Pick a target and identify which features drive it and the strength and direction of effects.# About Me
I'm an independent contractor helping companies build custom cloud apps and leverage data science, visual analytics, and AI. I offer low introductory rates, free consultation and estimates, and no minimums, so contact me today and let's chat about how I can help!
https://www.bryce-chamberlain.com/
# Example
```r
# packages required for storyteller.
require(glue)
require(magrittr)
require(progress)
require(reshape2)
require(ggplot2)# easyr project setup.
require(easyr)
begin()# read in some data. google or kaggle to find a dataset you are interested in.
dt = read.any('myfile.ext')# run the steps (functions in fun/ folder).
dt %<>%
clean(run_autotype = FALSE) %>% # read.any already runs autotype by default.
dropnoisecols() %>%
groupother() %>%
dropoutliers() %>%
correlatedfeatures_find()# summarize patterns found in your data.
summary(dt)# plot correlation between two variables.
plot_correlation(
dt,
c('incident_date_year', 'age')
)# fit a model against a target and identify key drivers.
dt %>%
correlatedfeatures_address(
target = 'total_claim_amount'
) %>%
fitmodel(
ignorecols = c('vehicle_claim', 'property_claim', 'injury_claim')
) %>%
summary()```
You can also visit https://www.kaggle.com/code/brycechamberlain/data-explore-automl/ for a notebook example.