https://github.com/cvxgrp/GGS

Greedy Gaussian Segmentation
https://github.com/cvxgrp/GGS

Last synced: about 2 months ago
JSON representation

Greedy Gaussian Segmentation

Host: GitHub
URL: https://github.com/cvxgrp/GGS
Owner: cvxgrp
License: bsd-2-clause
Created: 2016-10-17T23:13:35.000Z (over 8 years ago)
Default Branch: master
Last Pushed: 2023-01-02T08:27:50.000Z (over 2 years ago)
Last Synced: 2025-03-27T05:34:39.427Z (2 months ago)
Language: Python
Size: 580 KB
Stars: 99
Watchers: 9
Forks: 33
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-time-series - code

README

        # GGS

Greedy Gaussian Segmentation (GGS) is a Python solver for efficiently segmenting multivariate time series data. For implementation details, please see our paper at [http://stanford.edu/~boyd/papers/ggs.html](http://stanford.edu/~boyd/papers/ggs.html).

----

The GGS Solver takes an n-by-T data matrix and breaks the T timestamps on an n-dimensional vector into segments over which the data is well explained as independent samples from a multivariate Gaussian distribution. It does so by formulating a covariance-regularized maximum likelihood problem and solving it using a greedy heuristic, with full details described in the [paper](http://stanford.edu/~boyd/papers/ggs.html).

Download & Setup

======================

  1. Download the source code in the terminal by running:

```

git clone [email protected]:davidhallac/GGS.git

```

  2. Confirm that the code was properly downloaded by running:

```

cd GGS

python helloworld.py

```

  3. To write your own Python function that uses ggs, simply make sure that `ggs.py` is in the same directory as your new file, and then add the following code to the beginning of your script:

```

from ggs import *

```

Supported Functions

======================

The GGS package has three main functions:

```

bps, objectives = GGS(data, Kmax, lamb)

```

Finds K breakpoints in the data for a given regularization parameter lambda

**Inputs**

data - a n-by-T data matrix, with T timestamps of an n-dimensional vector

Kmax - the number of breakpoints to find

lamb - regularization parameter for the regularized covariance

**Returns**

bps - List of lists, where element *i* of the larger list is the set of breakpoints found at *K = i* in the GGS algorithm

objectives - List of the objective values at each intermediate step (for *K =* 0 to Kmax)

----

```

meancovs = GGSMeanCov(data, breakpoints, lamb)

```

Finds the means and regularized covariances of each segment, given a set of breakpoints.

**Inputs**

data - a n-by-T data matrix, with T timestamps of an n-dimensional vector

breakpoints - a list of breakpoint locations

lamb - regularization parameter for the regularized covariance

**Returns**

meancovs - a list of (mean, covariance) tuples for each segment in the data

----

```

cvResults = GGSCrossVal(data, Kmax=25, lambList = [0.1, 1, 10])

```

Runs 10-fold cross validation, and returns the train and test set likelihood for every (K, lambda) pair up to Kmax

**Inputs**

data - a n-by-T data matrix, with T timestamps of an n-dimensional vector

Kmax - the maximum number of breakpoints to run GGS on

lambList - a list of regularization parameters to test

**Returns**

cvResults - list of (lamb, ([TrainLL],[TestLL])) tuples for each regularization parameter in lambList. Here, TrainLL and TestLL are the average per-sample log-likelihood across the 10 folds of cross-validation for all *K*'s from 0 to Kmax

----

Additional optional parameters (for all three functions above): 

features = [] - select a certain subset of columns in the data to operate on

verbose = False - Print intermediate steps when running the algorithm

Example Usage

======================

Running `financeExample.py` will yield the following plot, showing the objective (Equation 4 in the paper) vs. the number of breakpoints:

![Objective vs. # of breakpoints](Images/helloworld.png)

Once we have solved for the locations of the breakpoints, we can use the `FindMeanCovs()` function to find the means and covariances of each segment. In the example in `helloworld.py`, plotting the means, variances, and covariances of the three signals yields:

![Means and covariances over time](Images/threeSignals.png)

To run cross-validation, which can be useful in determining optimal values of K and lambda, we can use the following code to load the data, run the cross-validation, and then plot the test and train likelihood:

```

from ggs import *

import numpy as np

import matplotlib.pyplot as plt

filename = "Returns.txt"

data = np.genfromtxt(filename,delimiter=' ')

feats = [0,3,7]

#Run cross-validaton up to Kmax = 30, at lambda = 1e-4

maxBreaks = 30

lls = GGSCrossVal(data, Kmax=maxBreaks, lambList = [1e-4], features = feats, verbose = False)

trainLikelihood = lls[0][1][0]

testLikelihood = lls[0][1][1]

plt.plot(range(maxBreaks+1), testLikelihood)

plt.plot(range(maxBreaks+1), trainLikelihood)

plt.legend(['Test LL','Train LL'], loc='best')

plt.show()

```

The resulting plot looks like:

![Test and train likelihood](Images/likelihood.png)

References

==========

[Greedy Gaussian Segmentation of Time Series Data -- D. Hallac, P. Nystrup, and S. Boyd][ggs]

[ggs]: http://stanford.edu/~boyd/papers/ggs.html "Greedy Gaussian Segmentation of Time Series Data -- D. Hallac, P. Nystrup, and S. Boyd"

Authors

------

David Hallac, Peter Nystrup, and Stephen Boyd.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cvxgrp/GGS

Awesome Lists containing this project

README