Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/lunakoly/spikesdetection

Simple app for detecting spikes on a noisy graph
https://github.com/lunakoly/spikesdetection

kotlin tools

Last synced: about 1 month ago
JSON representation

Simple app for detecting spikes on a noisy graph

Host: GitHub
URL: https://github.com/lunakoly/spikesdetection
Owner: lunakoly
Created: 2024-02-13T21:50:43.000Z (9 months ago)
Default Branch: main
Last Pushed: 2024-04-10T16:46:02.000Z (7 months ago)
Last Synced: 2024-06-12T03:07:36.786Z (5 months ago)
Topics: kotlin, tools
Language: Kotlin
Homepage:
Size: 365 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Spikes Detection

This is a simple program that can [detect spikes on a noisy graph](#Visualize-Spikes-Detection),
and then [automatically integrate them](#Integration). Additionally, it can also [dump noise visualizations](#Visualize-Noise).

## How to Use

The only supported input data file format is `asc`.

Output path is expected to point to a folder: this folder is then populated by the resulting files.

### Visualize Spikes Detection

![Левая-GS-6.asc.png](images%2F%D0%9B%D0%B5%D0%B2%D0%B0%D1%8F-GS-6.asc.png)

```zsh
java -jar SpikesDetection.jar \
--mode spikes-detection \
--in path/folder1/subfolder1 \
--in path/folder1/subfolder2 \
--path-prefix path \
--out output/folder
```

All files from every `--in` will be processed into new `png` files and stored in the output folder.
If their names would clash, they become `file.asc.png`, `file.asc.2.png`, `file.asc.3.png`, ...
If an optional `--path-prefix` is specified, file names get prepended with a portion of their absolute path between
the value of this option and their names with `-` in place of the path separator.
In the example above they would be `folder1-subfolder1-file.asc.png`, ...

Spikes detection is done by splitting the graph into some "optimal" number of segments,
each is then analyzed according to the selected `--fitting` and `--deviation` [methods](#Additional-Parameters).
The number of segments is iteratively increased until the noise deviation estimate starts improving slower than by 10%
per step.

Also see [Additional Parameters](#Additional-Parameters) to learn more about other options.

### Visualize Noise

![Левая-GS-6.asc.noise.png](images%2F%D0%9B%D0%B5%D0%B2%D0%B0%D1%8F-GS-6.asc.noise.png)

```zsh
java -jar SpikesDetection.jar \
--mode noise-visualization \
--in path/folder1/subfolder1 \
--in path/folder1/subfolder2 \
--path-prefix path \
--out output/folder
```

The options mentioned above work the same way as they do in [Visualize Spikes Detection](#Visualize-Spikes-Detection).

During noise deviation, the graph is not splitted into segments, to avoid generating multiple noise graphs.
It's assumed this nuance is not crucial in determining the optimal value of the `--bell` parameter.

Also see [Additional Parameters](#Additional-Parameters) to learn more about other options.

### Integration

![integration.png](images/integration.png)

```zsh
java -jar SpikesDetection.jar \
--mode integration \
--in path/folder1/subfolder1 \
--in path/folder1/subfolder2 \
--path-prefix path \
--out output/folder
```

In this mode every `--in` must denote a folder containing files like `161.asc`, `14,8.asc`, ...
That is, named as a comma-separated real number.
This mode results in 2 files generated in the output folder:
`integration.png` and `integration-data.csv`. The former is a plot where the
horizontal axis represents values denoted by the file names, and the vertical axis
is the result of numeric integration of the spikes within the corresponding file.

Integration is done by summing trapezoids' areas formed by adjacent values.
During integration, the expected function value is subtracted from the spikes.
During integration mode, the graph is split into segments the same way [it is for Spikes Detection](#Spikes-Detection).

Note that if there are multiple `--in`s with the same final `/subfilder` and there's no `--path-prefix` to help
avoiding clashes, then `integration.png` will miss all but 1 clashing datasets.

### Additional Parameters

| Option | Description |
|----------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `--bell` | This parameter controls σ of the normal distribution bell used during the noise graph approximation.
It's probably worth first playing around this value rather than jumping to `--deviation-scalar`, if the results are inaccurate.
By default, 5.0 |
| `--deviation-scalar` | This is basically a multiplier for the estimate of the noise deviation.
By default:
• 4 for `--deviation binary`
• 12 for `--deviation fake --fitting constant`
• 16 for `--deviation fake --fitting linear` |
| `--fitting` | Chooses the method of estimating the underlying function own value. It's the deviation from this value that will be tested for being the noise.
• `linear`: the function segments would be approximated by a straight line
• `constant`: the segments would just calculate the median value
By default, `linear` |
| `--deviation` | Chooses the method of estimating the deviation of the noise.
• `binary`: the noise σ will be calculated via binary search to find the best fit for the noise approximation constructed via small gausian bells placed at the points corresponding to different values of the noise deviation
• `fake`: the usual formula for sample standard deviation will be used, but will be given just the lower half of all noise deviation values (otherwise the impact of spikes is severe). Since this is not a normal distribution, but a truncated normal distribution, this is called "fake"
By default, `binary` |