Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/zpzim/scamp
The fastest way to compute matrix profiles on CPU and GPU!
https://github.com/zpzim/scamp
cuda gpu matrix-profile python time-series time-series-analysis
Last synced: 4 days ago
JSON representation
The fastest way to compute matrix profiles on CPU and GPU!
- Host: GitHub
- URL: https://github.com/zpzim/scamp
- Owner: zpzim
- License: mit
- Created: 2018-04-02T18:58:28.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2024-03-11T20:20:38.000Z (10 months ago)
- Last Synced: 2024-12-22T20:12:09.458Z (11 days ago)
- Topics: cuda, gpu, matrix-profile, python, time-series, time-series-analysis
- Language: C++
- Homepage: http://www.cs.ucr.edu/~eamonn/MatrixProfile.html
- Size: 44.6 MB
- Stars: 163
- Watchers: 13
- Forks: 36
- Open Issues: 15
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
[![Build and Test](https://github.com/zpzim/SCAMP/actions/workflows/build-and-test.yml/badge.svg)](https://github.com/zpzim/SCAMP/actions/workflows/build-and-test.yml)
[![RTD Build Status](https://img.shields.io/readthedocs/scamp-docs)](https://scamp-docs.readthedocs.io/en/latest/)[![Docker Build and Push](https://github.com/zpzim/SCAMP/actions/workflows/docker-build.yml/badge.svg)](https://github.com/zpzim/SCAMP/actions/workflows/docker-build.yml)
![Docker Image Version (latest semver)](https://img.shields.io/docker/v/zpzim/scamp?label=Docker%20Version)
![Docker Image Size (latest semver)](https://img.shields.io/docker/image-size/zpzim/scamp)
![Docker Pulls](https://img.shields.io/docker/pulls/zpzim/scamp)![PyPI](https://img.shields.io/pypi/v/pyscamp?label=pyscamp%20version)
![PyPI - Downloads](https://img.shields.io/pypi/dm/pyscamp?label=pypi%20downloads)![Conda (channel only)](https://img.shields.io/conda/vn/conda-forge/pyscamp-gpu?label=conda-forge::pyscamp-gpu)
![Conda](https://img.shields.io/conda/pn/conda-forge/pyscamp-gpu?label=pyscamp-gpu)
![Conda](https://img.shields.io/conda/dn/conda-forge/pyscamp-gpu?label=Downloads%3A%20pyscamp-gpu)![Conda (channel only)](https://img.shields.io/conda/vn/conda-forge/pyscamp-cpu?label=conda-forge::pyscamp-cpu)
![Conda](https://img.shields.io/conda/pn/conda-forge/pyscamp-cpu?label=pyscamp-cpu)
![Conda](https://img.shields.io/conda/dn/conda-forge/pyscamp-cpu?label=Downloads%3A%20pyscamp-cpu)[![DOI](https://zenodo.org/badge/127799249.svg)](https://zenodo.org/badge/latestdoi/127799249)
# SCAMP: SCAlable Matrix Profile
## Table of Contents
[Overview](https://github.com/zpzim/SCAMP#overview) \
[Documentation](https://github.com/zpzim/SCAMP#documentation) \
[Performance](https://github.com/zpzim/SCAMP#performance) \
[Python Module](https://github.com/zpzim/SCAMP#python-module) \
[Run Using Docker](https://github.com/zpzim/SCAMP#run-using-docker) \
[Distributed Operation](https://github.com/zpzim/SCAMP#distributed-operation) \
[Reference](https://github.com/zpzim/SCAMP#reference)## Overview
This is a GPU/CPU implementation of the SCAMP algorithm. SCAMP takes a time series as input and computes the matrix profile for a particular window size. You can read more at the [Matrix Profile Homepage](http://www.cs.ucr.edu/~eamonn/MatrixProfile.html)
This is a much improved framework over [GPU-STOMP](https://github.com/zpzim/STOMPSelfJoin) which has the following additional features:
* Tiling for large inputs
* Computation in fp32, mixed fp32/fp64, or fp64 (double is recommended for most datasets, single precision will work for some)
* fp32 version should get good performance on GeForce cards
* AB joins (you can produce the matrix profile from 2 different time series)
* Distributable (we use GCP but other cloud platforms can work) with verified scalability to billions of datapoints
* More types of matrix profiles! KNN, Matrix Summary, Sum, and 1NN without index! See the Docs!
* Extremely Efficient Implementation
* Extensible to adding optimized versions of custom join operations.
* CPU Support (Only enabled for double precision; does not support KNN joins yet)
* Handles NaN input values. The matrix profile will be computed while excluding any subsequence with a NaN value
* Python module: Use SCAMP in Python with pyscamp
* conda-forge integration: Install pyscamp seamlessly with conda.
* Extensive integration testing: SCAMP has thousands of input configurations tested with every PR.
* Automatic benchmarking: Helps ensure performance doesn't slip with future updates.## Why use SCAMP?
* It is [faster](https://scamp-docs.readthedocs.io/en/latest/performance.html) than other matrix profile libraries. For example, it is **20x** to **100x** faster than stumpy.
* It is very easy to install using conda and has very few dependencies.
* It handles real data: very large inputs, missing values, and flat regions with little issue.
* It can compute various other types of matrix profiles, including efficiently computing KNN matrix profiles, and matrix summaries (a.k.a. mplots). And can be extended to compute other types of profile efficiently.## Documentation
SCAMP's documentation can be found at [readthedocs](https://scamp-docs.readthedocs.io/en/latest/).## Python module
`pyscamp` is available through conda-forge:
~~~
# To install pyscamp with cpu/gpu support on Linux and Windows.
conda install -c conda-forge pyscamp-gpu# To install pyscamp with cpu support only on Windows, Linux, or MacOS.
conda install -c conda-forge pyscamp-cpu
~~~Note that `pyscamp-gpu` can be installed and used even if you don't have a GPU, it will simply fall back to using your CPU. However, `pyscamp-cpu` is preferrable if you don't have a GPU because it builds with a newer compiler and does not require installing the `cudatoolkit` depencency.
If you run into problems using GPUs with `pyscamp-gpu` make sure your NVIDIA drivers are up to date. This is the most common cause of issues.
### Installing from source
If you want you can build pyscamp from source which will have improved performance. A source distribution for a python3 module using pybind11 is available on pypi.org to install run:
~~~
# Python 3 and a c/c++ compiler is required.
# cmake is required (if you don't have it you can pip install cmake)
pip install pyscamp
~~~Once installed you can use SCAMP in Python as follows:
~~~
import pyscamp as mp# Allows checking if pyscamp was built with CUDA and has GPU support.
has_gpu_support = mp.gpu_supported()# Self join
profile, index = mp.selfjoin(a, sublen)# AB join using 4 threads, outputting pearson correlation.
profile, index = mp.abjoin(a, b, sublen, pearson=True, threads=4)
~~~More information and the API documentation for pyscamp is available on [readthedocs](https://scamp-docs.readthedocs.io/en/latest/)
## Run Using Docker
You can run SCAMP via [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) using the prebuilt [image](https://hub.docker.com/r/zpzim/scamp) on dockerhub.In order to expose the host GPUs nvidia-docker must be installed correctly. Please follow the directions provided on the nvidia-docker github page. The following example uses docker 19.03 functionality:
~~~
docker pull zpzim/scamp:latest
docker run --gpus all \
--volume /path/to/host/input/data/directory:/data \
--volume /path/to/host/output/directory:/output \
zpzim/scamp:latest /SCAMP/build/SCAMP \
--window= --input_a_file_name=/data/ \
--output_a_file_name=/output/ \
--output_a_index_file_name=/output/
~~~## Distributed Operation
We have a client/server architecture built using grpc. Tested on [GKE](https://cloud.google.com/kubernetes-engine/) but should be possible to get working on [Amazon EKS](https://aws.amazon.com/eks/) as well.For more information on how to use the scamp client and server, please take a look at the [documentation](https://scamp-docs.readthedocs.io/en/latest/)
## Reference
If you use SCAMP in your work, please reference the following paper:
~~~
Zimmerman, Zachary, et al. "Matrix Profile XIV: Scaling Time Series Motif Discovery with GPUs to Break a Quintillion Pairwise Comparisons a Day and Beyond." Proceedings of the ACM Symposium on Cloud Computing. 2019.
~~~