https://github.com/superchordate/storyteller

AutoML R framework functions for quickly finding stories from data.
https://github.com/superchordate/storyteller

automl eda exploratory-data-analysis r

Last synced: 7 months ago
JSON representation

AutoML R framework functions for quickly finding stories from data.

Host: GitHub
URL: https://github.com/superchordate/storyteller
Owner: superchordate
License: gpl-3.0
Created: 2022-02-09T02:15:43.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2024-02-21T20:59:07.000Z (over 1 year ago)
Last Synced: 2024-08-13T07:11:12.973Z (10 months ago)
Topics: automl, eda, exploratory-data-analysis, r
Language: R
Homepage:
Size: 80.1 KB
Stars: 4
Watchers: 3
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

jimsghstars - superchordate/storyteller - AutoML R framework functions for quickly finding stories from data. (R)

README

        R functions (package TBD) to quicky find stories in data using data mining models. 

These steps are working (ish):

* clean: Clean the data. Right now this just means replacing Inf with NA. Will expand this over time.

* dropnoisecols: Remove columns that cannot easily be analyzed or do not contribute information. 

* groupother: Group small groups.

* dropoutliers: Remove outliers.

* find_correlated_features: compare features to identify those that are correlated.

* Method to easily plot correlated features.

* Pick a target and identify which features drive it and the strength and direction of effects.

# About Me

I'm an independent contractor helping companies build custom cloud apps and leverage data science, visual analytics, and AI. I offer low introductory rates, free consultation and estimates, and no minimums, so contact me today and let's chat about how I can help!

https://www.bryce-chamberlain.com/

# Example

```r

# packages required for storyteller.

require(glue)

require(magrittr)

require(progress)

require(reshape2)

require(ggplot2)

# easyr project setup.

require(easyr)

begin()

# read in some data. google or kaggle to find a dataset you are interested in. 

dt = read.any('myfile.ext')

# run the steps (functions in fun/ folder).

dt %<>% 

  clean(run_autotype = FALSE) %>% # read.any already runs autotype by default.

  dropnoisecols() %>%

  groupother() %>%

  dropoutliers() %>%

  correlatedfeatures_find()

# summarize patterns found in your data.

summary(dt)

# plot correlation between two variables.

plot_correlation(

  dt,  

  c('incident_date_year', 'age')

)

# fit a model against a target and identify key drivers.

dt %>%

  correlatedfeatures_address(

    target = 'total_claim_amount'

  ) %>%

  fitmodel(

    ignorecols = c('vehicle_claim', 'property_claim', 'injury_claim')

  ) %>%

  summary()

```

You can also visit https://www.kaggle.com/code/brycechamberlain/data-explore-automl/ for a notebook example.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/superchordate/storyteller

Awesome Lists containing this project

README