https://github.com/jaegertracing/jaeger-analytics-java
Data analytics pipeline and models for tracing data
https://github.com/jaegertracing/jaeger-analytics-java
Last synced: 5 months ago
JSON representation
Data analytics pipeline and models for tracing data
- Host: GitHub
- URL: https://github.com/jaegertracing/jaeger-analytics-java
- Owner: jaegertracing
- License: apache-2.0
- Archived: true
- Created: 2019-09-26T13:43:35.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2024-07-11T15:04:41.000Z (over 1 year ago)
- Last Synced: 2025-03-24T21:40:36.763Z (7 months ago)
- Language: Java
- Homepage:
- Size: 751 KB
- Stars: 45
- Watchers: 9
- Forks: 24
- Open Issues: 31
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[![Build Status][ci-img]][ci]
# Jaeger Analytics
Experimental repository with data analytics models, pipelines for Jaeger tracing data.
Table of Contents
* [Jaeger analytics Java](#jaeger-analytics-java)
+ [Metrics](#metrics)
- [Trace quality metrics](#trace-quality-metrics)
+ [Development](#development)
+ [Configuration](#configuration)
* [Gremlin documentation](#gremlin-documentation)
* [Spark Kafka documentation](#spark-kafka-documentation)
* [Deploy Kafka, Elasticsearch and Jaeger on Kubernetes using operators](#deploy-kafka--elasticsearch-and-jaeger-on-kubernetes-using-operators)
+ [Expose Kafka outside of cluster and get host:port](#expose-kafka-outside-of-cluster-and-get-host-port)
+ [Expose Jaeger collector outside of the cluster](#expose-jaeger-collector-outside-of-the-cluster)
+ [Deploy Hotrod example application](#deploy-hotrod-example-application)
* [Get exposed metrics](#get-exposed-metrics)
* [Run Jupyter as docker](#run-jupyter-as-docker)
+ [Run on Mybinder](#run-on-mybinder)
* [Using Jaeger in JUnit with Testcontainers](#using-jaeger-in-junit-with-testcontainers)Table of contents generated with markdown-toc
## Jaeger analytics Java
Repository contains:
* Graph trace DSL based on Apache Gremlin. It helps to write graph "queries" against a trace
* Spark streaming integration with Kafka for Jaeger topics
* Loading trace from Jaeger query service
* Jupyter notebooks to run examples with data analytic models
* Data analytics models, metrics based on tracing data
* Grafana [dashboards](./grafana)Blog posts, demos and conference talks:
* [Data analytics with Jaeger aka traces tell us more!](https://medium.com/jaegertracing/data-analytics-with-jaeger-aka-traces-tell-us-more-973669e6f848)
* [Jaeger data analytics with Jupyter notebooks](https://medium.com/jaegertracing/jaeger-data-analytics-with-jupyter-notebooks-b094fa7ab769)### Metrics
The library calculates various metrics from traces. The metrics are
currently exposed in Prometheus format.Currently these metrics are calculated:
* Trace height - trace tree height. Maximum number of spans from root to leaf
* Service depth - number of service hops from a service to the root service
* Service height - number of service hops from a service to the leaf service
* Service's direct downstream dependencies - number services a service directly calls
* Service's direct upstream parents - number of services directly calling a service
* Number of errors - number of errors per service
* Network latency - latency between client and server spans split by service names```
network_latency_seconds_bucket{client="frontend",server="driver",le="0.005",} 32.0
network_latency_seconds_bucket{client="frontend",server="driver",le="0.01",} 32.0
network_latency_seconds_bucket{client="frontend",server="driver",le="0.025",} 32.0
service_height_total{quantile="0.7",} 2.0
```#### Trace quality metrics
Trace quality metrics measure the quality of tracing data reported by services.
These metrics can indicate that further instrumentation is needed or the instrumentation
quality is not high enough.These metrics are ported from [jaeger-analytics-flink/tracequality](https://github.com/jaegertracing/jaeger-analytics-flink/tree/master/tracequality-job/src/main/java/io/jaegertracing/tracequality/score).
The original design stores results in separate storage table (Cassandra). The intention here is to
export results as metrics and link relevant traces as exemplars (once OSS metrics APIs support that).* Minimum Jaeger client version - minimum Jaeger client version
* Has client and server tags - span contains client or server tags
* Unique span IDs - trace contains spans with unique span IDs```
trace_quality_server_tag_total{pass="false",service="mysql",} 32.0
trace_quality_server_tag_total{pass="true",service="customer",} 26.0
trace_quality_minimum_client_version_total{pass="false",service="route",version="Go-2.21.1",} 320.0
```Example Prometheus queries:
```
(trace_quality_server_tag_total{pass="true",service="customer",} / trace_quality_server_tag_total{service="customer",}) * 100
trace_quality_server_tag_total{pass="true",service="customer",} / ignoring (pass,fail) sum without(pass, fail) (trace_quality_server_tag_total)
// if values are missing
(trace_quality_server_tag_total{pass="true",service="mysql",} / trace_quality_server_tag_total{service="mysql",} ) * 100 or vector(0)
```
### Development
Add annotation processor is needed for IDE configuration. It is used to generate trace DSL.
```
org.apache.tinkerpop.gremlin.process.traversal.dsl.GremlinDslProcessor
```Build and run
```bash
mvn clean compile exec:java
```### Configuration
Configuration properties for `SparkRunner`.
* `SPARK_MASTER`: Spark master to submit the job to; Defaults to `local[*]
* `SPARK_STREAMING_BATCH_DURATION`: interval defines the size of the batch in milliseconds; Defaults to `5000
* `KAFKA_JAEGER_TOPIC`: Kafka topic with Jaeger spans; Defaults to `jaeger-spans`
* `KAFKA_BOOTSTRAP_SERVER`: Kafka bootstrap servers; Defaults to `localhost:9092`
* `KAFKA_START_FROM_BEGINNING`: Read kafka topic from the beginning; Default to true
* `PROMETHEUS_PORT`: Prometheus exporter port; Defaults to `9111`
* `TRACE_QUALITY_{language}_VERSION`: Minimum Jaeger client version for trace quality metric; Supported languages `java`, `node`, `python`, `go`; Defaults to latest client versions## Gremlin documentation
* http://kelvinlawrence.net/book/Gremlin-Graph-Guide.html## Spark Kafka documentation
* https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html
* https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html## Deploy Kafka, Elasticsearch and Jaeger on Kubernetes using operators
The following command creates Jaeger CR which triggers deployment of Jaeger, Kafka and Elasticsearch.
This works only on OpenShift 4.x and prior deploying make sure Jaeger, Strimzi(Kafka) and
Elasticsearch(from OpenShift cluster logging) operators are running.
```
oc create -f manifests/jaeger-auto-provisioned.yaml
```If you are running on vanilla Kubernetes you can deploy `jaeger-external-kafka-es.yaml` CR and configure
connection strings to Kafka and Elasticsearch.### Expose Kafka outside of cluster and get host:port
Expose Kafka IP address outside of the cluster:
```yaml
listeners:
# ...
external:
type: loadbalancer
tls: false
```Get external broker address:
```bash
oc get kafka simple-streaming -o jsonpath="{.status.listeners[*].addresses}"
```### Expose Jaeger collector outside of the cluster
```bash
oc create route edge --service=simple-streaming-collector --port c-binary-trft --insecure-policy=Allow
```### Deploy Hotrod example application
```bash
oc get routes # get jaeger collector route
docker run --rm -it -e "JAEGER_ENDPOINT=http://host:80/api/traces" -p 8080:8080 jaegertracing/example-hotrod:latest
```## Get exposed metrics
The streaming job exposes metrics on http://localhost:9001.## Run Jupyter as docker
The docker image should be published on Docker Hub. If you are modifying the source code of the library then
inject it as volume `-v ${PWD}:/home/jovyan/work` or rebuild the image too see the latest changes.```bash
make jupyter-docker
make jupyter-run
```Open browser on http://localhost:8888/lab and copy token from the command line. Then navigate to `./work/jupyter/` directory and open notebook.
### Run on Mybinder
[![Launch IJava binder][binder-badge-img]](https://mybinder.org/v2/gh/jaegertracing/jaeger-analytics-java/master) [![Launch IJava lab binder][binder-lab-badge-img]](https://mybinder.org/v2/gh/jaegertracing/jaeger-analytics-java/master?urlpath=lab)
## Using Jaeger in JUnit with Testcontainers
Artifact `io.jaegertracing:jaeger-testcontainers` contains an implementation for using
Jaeger `all-in-one` docker container in JUnit tests:```java
JaegerAllInOne jaeger = new JaegerAllInOne("jaegertracing/all-in-one:latest");
jaeger.start();
io.opentracing.Tracer tracer = jaeger.createTracer("my-service");
```[binder-badge-img]: https://img.shields.io/badge/launch-binder-E66581.svg?logo=
[binder-lab-badge-img]: https://img.shields.io/badge/launch-binder%20lab-579ACA.svg?logo=
[ci-img]: https://github.com/jaegertracing/jaeger-analytics-java/workflows/Test/badge.svg
[ci]: https://github.com/jaegertracing/jaeger-analytics-java/actions