Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/gousiosg/cliffs.d

A reasonably fast implementation of Cliff's delta effect size metric
https://github.com/gousiosg/cliffs.d

r statistics

Last synced: 4 days ago
JSON representation

A reasonably fast implementation of Cliff's delta effect size metric

Host: GitHub
URL: https://github.com/gousiosg/cliffs.d
Owner: gousiosg
License: mit
Created: 2013-08-25T09:19:37.000Z (about 11 years ago)
Default Branch: master
Last Pushed: 2013-08-25T17:31:42.000Z (about 11 years ago)
Last Synced: 2024-10-11T08:46:04.423Z (27 days ago)
Topics: r, statistics
Language: C
Homepage:
Size: 129 KB
Stars: 2
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        Cliffs.d

========

A reasonably fast implementation of Cliff's delta effect size metric

Calculates the metric using a tight C loop instead of examining the 

input's dominance matrix. The algorithm is still `O(n x m)` (where

`n` and `m` the sizes of the input vectors) but it

is space efficient and executes in C code rather than R. As a result,

it can process input vectors of 50k elements in about 10 seconds

on a MacBook Air 1.8GHz.

The equivalent R code is the following (found [here](https://stat.ethz.ch/pipermail/r-help/2007-April/129592.html)):

```s

cliffs.d.1 <- function(x, y) {

  mean(rowMeans(sign(outer(x, y, FUN="-"))))

}

```

#### Benchmarks

```s

> library(rbenchmark)

> library(cliffsd)

> cliffs.d.1 <- function(x, y) {

+   mean(rowMeans(sign(outer(x, y, FUN="-"))))

+ }

> a <- sample(1:10, 1000, replace = T)

> b <- sample(1:10, 1000, replace = T)

> benchmark(cliffs.d(a,b), cliffs.d.1(a,b))

              test replications elapsed relative user.self sys.self user.child

2 cliffs.d.1(a, b)          100   4.646   17.532     4.122    0.526          0

1   cliffs.d(a, b)          100   0.265    1.000     0.264    0.001          0

> a <- sample(1:10, 10000, replace = T)

> b <- sample(1:10, 10000, replace = T)

> benchmark(cliffs.d(a,b), cliffs.d.1(a,b), replications = 10)

              test replications elapsed relative user.self sys.self user.child

2 cliffs.d.1(a, b)           10  42.433   16.667    27.476   12.840          0

1   cliffs.d(a, b)           10   2.546    1.000     2.505    0.016          0

> a <- sample(1:10, 50000, replace = T)

> b <- sample(1:10, 50000, replace = T)

> benchmark(cliffs.d(a,b), replications = 1)

            test replications elapsed relative user.self sys.self user.child sys.child

1 cliffs.d(a, b)            1  10.422        1    10.421    0.007          0         0

```

#### Correctness test

```s

library(cliffsd)

cliffs.d.1 <- function(x, y) {

  mean(rowMeans(sign(outer(x, y, FUN="-"))))

}

correct = 0

for(i in c(1:100)) {

    a <- sample(1:runif(1, 1, 100), runif(1, 1, 1000), replace = T)

    b <- sample(1:runif(1, 1, 100), runif(1, 1, 1000), replace = T)

    t <- cliffs.d(a,b)

    v <- cliffs.d.1(a,b)

    if (signif(v, digits = 5) != signif(t, digits = 5)) {

      print(a)

      print(b)

      print(sprintf("t: %f, v: %f", t, v))

    } else {

      correct = correct + 1

    }

}

print(sprintf("%d/100 correct tests", correct))

```