Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/cnelias/serialdependence.jl

Julia module containing standard methods from categorical time-series analysis.
https://github.com/cnelias/serialdependence.jl

categorical-data correlation julia time-series-analysis

Last synced: 2 days ago
JSON representation

Julia module containing standard methods from categorical time-series analysis.

Awesome Lists containing this project

README

        

# SerialDependence.jl
When dealing with **categorical data**, things like autocorrelation function are not defined. This is what this module is for : computing **categorical serial dependences**.
|**Travis**|**Appveyor**|
|:--------:|:----------:|
[![Build Status](https://travis-ci.com/johncwok/SerialDependence.jl.svg?branch=master)](https://travis-ci.com/johncwok/SerialDependence.jl)|[![Build status](https://ci.appveyor.com/api/projects/status/1edibrv74rys49ru?svg=true)](https://ci.appveyor.com/project/johncwok/serialdependence-jl)|

The module mostly implements the methods described in C. Weiss's book *"An Introduction to Discrete-Valued
Time Series"* (2018) [1], with some extras. It contains three main functions :

## Main functions
All the module's functions require a `'lag's` input : `'lags'` can be an `Int`, or an `Array{Int,1}` if you want to do a serial dependence plot. The function then returns a `Float64` or an `Array{Float64,1}` depending on `'lags'` being an `Int` or `Array{Int,1}`.

- **`cramer_coefficient(series, lags)`**: measures average **association** between elements of `'series'` at time `t` and time `t + lags`.
Cramer's k is an *unsigned* measurement : its values lies in [0,1], 0 being perfect independence and 1 perfect dependence. k can be bias, for more infos, refer to [1].

- **`cohen_coefficient(series, lags)`**: measures average **agreement** between elements of `'series'` at time `t` and time `t + lags`.

Cohen's k is a *signed* measurement : its values lie in [-pe/(1 -pe), 1], with positive (negative) values indicating positive (negative) serial dependence at 'lags'. pe is probability of agreement by chance.

- **`theils_u(series, lags)`**: measures average portion of **information** known about `'series'` at `t + lags` given that `'series'` is known at time `t`. U is an *unsigned* measurement: its values lies in [0,1], 0 meaning no information shared and 1 complete knowledge (determinism).

### Usefull extras
- **`bootstrap_CI(Series, lags, coef_func, n_iter = 1000)`**: Returns top and bottom limit for a 95% confidence interval at values of 'lags'.
* `'coef_func'` is the **function** for which the CI needs to be computed. Possible values : 'cramer_coefficient, cohen_coefficient, theils_u'.
* `'n_iter'` controls how many iterations are run during the bootstrap process. Large `'n_iter'`, means more precision but also more compute time.
- **`rate_evolution(Series)`**: This is a visual test of "stationarity" : if it varies linearly, then the time-series can be considered as stationary. Returns an `array` of `array`. Each of the internal array represents one of the categories in `'Series'`and describes it's evolution rate.

## Example
Using the pewee birdsong data (1943) one can do a serial dependence plot using Cohen's cofficient as follow :
```Julia
using DelimitedFiles
using SerialDependence
using Plots
#reading 'pewee' time-series test folder.
series = readdlm("test\\pewee.txt",',')[1,:]
lags = collect(1:25)
v = cohen_coefficient(series, lags)
t, b = bootstrap_CI(series, cramer_coefficient, lags)
a = plot(lags, v, xlabel = "Lags", ylabel = "K", label = "Cramer's k")
plot!(a, lags, t, color = "red", label = "Limits of 95% CI"); plot!(a, lags, b, color = "red", label = "")
```

## TO-DO
[] Implement bias correction for cramer's v

[1] DOI : 10.1002/9781119097013