Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/davejacobs/stats

An experiment with stats, the Ruby way
https://github.com/davejacobs/stats

ruby statistics

Last synced: about 2 months ago
JSON representation

An experiment with stats, the Ruby way

Awesome Lists containing this project

README

        

# Stats #

## Description ##

This is a prototype of a statistical library for Ruby. Starting out, the purpose of the library is to be readable (for people studying statistics), to be well-tested (against R and Python statistical functions), and to be useful for Small Data. Big Data can come later, if I have enough fun. With `stats`, I aim to create an API that makes statistics intuitive and harder to mess up. For example, I'd like to take a stab at an assumption framework that can tag specific functions with assumptions that will throw warnings if they're not met.

---

## Try it out ##

Once this is stable and fully tested (it is so far for all the functions listed below), I'll consider publishing it as a gem. Until then, you can play around with `master`:

brew install gsl
git clone https://github.com/davejacobs/stats.git
cd stats
bundle

## Running tests ##

I've started integrating R into my tests to make testing as easy and repeatable as possible. I'm also planning to incorporate something like Randly to expand the values that I test.

To run tests:

brew install homebrew/science/r
rspec

## Progress ##

### For developers ###

- [x] Get Ruby GSL bindings (`gem install gsl`) to work on Ruby 2.0/OS X
- [ ] Implement gemspec so this is installable via git URL

### Distribution functions ###

I've added a wrapper around GSL distribution functions, for more intuitive access and testing.

- [x] Normal distribution - PDF & CDF
- [x] Chi square distribution - PDF & CDF
- [x] T distribution - PDF & CDF
- [x] F distribution - PDF & CDF

### Basic functions ###

- [x] Mean, arithmetic
- [x] Mean, geometric
- [x] Median
- [x] Mode
- [x] Variance
- [x] Standard deviation
- [x] Standard error of the mean (for samples only)
- [x] Relative standard error of the mean (for samples only)
- [x] Coefficient of variation

### Significance tests ###

- [x] Chi square
- [x] T-test, single sample
- [x] T-test, two-sample
- [x] T-test, repeated measures
- [x] Wilcoxon rank sum test
- [ ] Wilcoxon signed rank test
- [ ] Median test
- [ ] Kruskall-Wallis H test
- [ ] Friedman test
- [x] ANOVA, one-way
- [ ] Factorial ANOVA, two-way
- [ ] Factorial ANOVA, three-way
- [ ] ANOVA, repeated measures
- [ ] MANOVA
- [ ] ANCOVA
- [ ] Welch's ANOVA
- [ ] Fisher's least significant difference

### Regressions ###

- [ ] Linear regression
- [ ] Multiple linear regression
- [ ] Pearson's correlation
- [ ] Spearman correlation

### Support & other ###

- [x] Basic assumption framework
- [ ] Confidence intervals (general idea)
- [ ] Basic data structures
- [ ] Significance methods on data structures
- [ ] Test using R integration and something like [Rantly](https://github.com/hayeah/rantly)

## Resources ##

- [How to choose the right statistical test](http://www.graphpad.com/support/faqid/1790/)
- [Wilkinson's *Statistics Quiz* (RTF)](http://tspintl-test.com/products/tsp/benchmarks/wilk.rtf)
- Assessing the reliability of statistical software
- [Part 1](http://www.questia.com/googleScholar.qst?docId=5001390400)
- [Part 2](http://www.questia.com/googleScholar.qst?docId=5001888610)