Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/paypal/yurita
Anomaly detection framework @ PayPal
https://github.com/paypal/yurita
Last synced: 2 months ago
JSON representation
Anomaly detection framework @ PayPal
- Host: GitHub
- URL: https://github.com/paypal/yurita
- Owner: paypal
- License: apache-2.0
- Archived: true
- Created: 2019-03-28T22:16:17.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2019-09-02T20:54:00.000Z (over 5 years ago)
- Last Synced: 2024-08-01T17:31:41.840Z (5 months ago)
- Language: Scala
- Homepage:
- Size: 4.02 MB
- Stars: 106
- Watchers: 14
- Forks: 32
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.md
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
- awesome-streaming - yurita - Anomaly detection framework built on Spark Structured Streaming from Paypal. (Table of Contents / Online Machine Learning)
- awesome-streaming - yurita - Anomaly detection framework built on Spark Structured Streaming from Paypal. (Table of Contents / Online Machine Learning)
README
[![logo](docs/YuritaLogo.png)](https://yurita.readthedocs.io)
# Yurita[![Join the chat at https://gitter.im/pp-yurita](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/pp-yurita?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
[![Build Status](https://travis-ci.org/paypal/yurita.svg?branch=master)](https://travis-ci.org/paypal/yurita)
[![Codacy Badge](https://api.codacy.com/project/badge/Grade/4536adca78704f699198a03f9b92a133)](https://app.codacy.com/app/r39132/yurita?utm_source=github.com&utm_medium=referral&utm_content=paypal/yurita&utm_campaign=Badge_Grade_Dashboard)
[![License](https://img.shields.io/badge/License-Apache%202.0-red.svg)](https://opensource.org/licenses/Apache-2.0)
[![Documentation Status](https://readthedocs.org/projects/yurita/badge/?version=latest)](https://yurita.readthedocs.io)Yurita is an open source project for developing large scale anomaly detection models
[Site](https://github.com/paypal/yurita/)## Getting Started
### Documentation
Documentation on Yurita's architecture, statistical models available, anomaly detection pipeline/data flow, etc can be found here:### Build from source
```console
foo@bar:~/yurita$ ./gradlew clean build
foo@bar:~/yurita$ ./gradlew publishToMavenLocal
```
### Install from Maven Central*Please build the project from source at this time or try our dockerized Yurita demo application to build automatically as we make the project jar available on Maven Central in upcoming few days.*
```xml
io.github.paypal
yurita
1.0.0```
Other Required Dependencies:
```xmlorg.apache.spark
spark-core_2.11
2.4.1org.apache.spark
spark-sql_2.11
2.4.1```
## Running Dockerized Demo Application
`YuritaSampleApp` directory in the Yurita project root path contains a standalone scala project for you to play around with. Run the demo through Docker inside `YuritaSampleApp` directory as shown below.
### Build Docker Image
```console
foo@bar:~/YuritaSampleApp$ docker build -f Dockerfile -t yuritademo .
```### Run Docker Container
```console
foo@bar:~/YuritaSampleApp$ docker run -p 8080:8080 -t yuritademo
```## Writing Your First App
Create SparkSession with your own configurations
```scala
val appName = "AnomalyDetectionAPI"
val sparkConf = new SparkConf().setAppName(appName).setMaster("local[*]")
val spark = SparkSession
.builder()
.config(sparkConf)
.getOrCreate()
```
Create dataframe of your data points/attributes with what time interval they occur on
```scala
//sample window timestamp
val window1 = (dateFormat.parse("2011-01-18 01:00:00.0"), dateFormat.parse("2011-01-18 01:00:10.0"))
```
```scala
val inputDF: DataFrame = Seq(
Person("Ned", "Stark", 40, 40.6, "M", Array(5.5), getTimestamp(window1)),
Person("Arya", "Stark", 9, 40.1, "F", Array(5.6), getTimestamp(window2)),
Person("Sansa", "Stark", 13, 46.3, "F", Array(5.6), getTimestamp(window3)),
Person("Jon Snow", "Stark", 17, 11.4, "M", Array(12.4), getTimestamp(window1),
...
).toDF()
```Create a data pipe that will perform specified stastical methods on set columns of dataframe within the window size.
```scala
val categoricalPipe = PipelineBuilder()
.onColumns(Seq("surname", "gender"))
.setWindowing(Window.fixed("1 hour"))
.setWindowReferencing(windowRef)
.buildCategoricalModel(
Functions.Categorical.avgRef,
Functions.Categorical.entropy,
Functions.statResultThreshold(3.0))
```Combine multiple pipelines
```scala
val workload = AnomalyWorkload.builder()
.addAllPipelines(categoricalPipe)
.addPartitioner("surname")
.buildWithWatermark("timestamp", "2 hours")
```Dataset extended api
```scala
df.detectAnomalies(workload).map(_.toString).foreach(println(_))
```
Full demo application code can be viewed in our YuritaSampleApp project.## Contributing to Yurita
Thank you very much for contributing to Yurita. Please read the [contribution guidelines](CONTRIBUTING.md) for the process.
## License
Yurita is licensed under the [Apache License, v2.0](LICENSE.txt)