{"id":20337325,"url":"https://github.com/expediadotcom/haystack-service-graph","last_synced_at":"2025-04-11T22:42:04.811Z","repository":{"id":49202353,"uuid":"126519789","full_name":"ExpediaDotCom/haystack-service-graph","owner":"ExpediaDotCom","description":"haystack component that infers service call graph from the spans received","archived":false,"fork":false,"pushed_at":"2022-07-15T21:06:12.000Z","size":591,"stargazers_count":4,"open_issues_count":7,"forks_count":2,"subscribers_count":15,"default_branch":"master","last_synced_at":"2025-03-25T18:45:10.335Z","etag":null,"topics":["distributed-tracing"],"latest_commit_sha":null,"homepage":null,"language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ExpediaDotCom.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-03-23T17:44:45.000Z","updated_at":"2024-02-15T04:02:34.000Z","dependencies_parsed_at":"2022-08-26T09:41:47.105Z","dependency_job_id":null,"html_url":"https://github.com/ExpediaDotCom/haystack-service-graph","commit_stats":null,"previous_names":[],"tags_count":20,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ExpediaDotCom%2Fhaystack-service-graph","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ExpediaDotCom%2Fhaystack-service-graph/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ExpediaDotCom%2Fhaystack-service-graph/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ExpediaDotCom%2Fhaystack-service-graph/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ExpediaDotCom","download_url":"https://codeload.github.com/ExpediaDotCom/haystack-service-graph/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248493022,"owners_count":21113159,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["distributed-tracing"],"created_at":"2024-11-14T21:08:39.032Z","updated_at":"2025-04-11T22:42:04.785Z","avatar_url":"https://github.com/ExpediaDotCom.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Build Status](https://travis-ci.org/ExpediaDotCom/haystack-service-graph.svg?branch=master)](https://travis-ci.org/ExpediaDotCom/haystack-service-graph)\n[![License](https://img.shields.io/badge/license-Apache%20License%202.0-blue.svg)](https://github.com/ExpediaDotCom/haystack/blob/master/LICENSE)\n\n# Haystack-service-graph\n\nThis repository has two components that focus on \n\n* Building a service dependency graph from incoming spans and\n* Computing the network latency between the services that allows\n[haystack-trends](https://github.com/ExpediaDotCom/haystack-traces) to produce latency trends between services.\n\n## Required Reading\n\nIn order to understand Haystack, we recommend reading the details of the\n[Haystack](https://expediadotcom.github.io/haystack) project. Haystack is written in \n[Kafka streams](http://docs.confluent.io/current/streams/index.html) \nand hence some prior knowledge of Iafka streams is helpful.\n\n## Component: node-finder\n\nThis component discovers the relationships between services. Eventually those relationships will be expressed as a graph\nin which the services are the nodes and the operations are the edges. Since client spans do not carry the name of the \nservice being called, and server spans do not carry the name of the service calling them, this component accumulates the\nincoming spans and uses `span-id` to discover the dependent services and the operations between them. \n\nDiscovered \"span pairs\" are then used to produce two different outputs\n\n1. A simple object that has \n    * the calling service name \n    * the called service name \n    * the operation name\n2. A `MetricPoint` object with the `latency` between the service pair, discovered by examining timestamps in the spans.\n\nLike many other components of Haystack, this component is also a `Kafka streams` application. The picture below shows \nthe topology / architecture of this component.\n\n                                         +---------------+\n                                         |               |\n                                         |  proto-spans  |\n                                         |               |\n                                         +-------+-------+\n                                                 |\n                                       +---------V----------+\n                                       |                    |\n                                  +----+  span-accumulator  +----+\n                                  |    |                    |    |\n                                  |    +--------------------+    |\n                                  |                              |\n                        +---------V---------+       +------------V------------+\n                        |                   |       |                         |\n                        |  latency-producer |       |  nodes-n-edges-producer |\n                        |                   |       |                         |\n                        +---------+---------+       +------------+------------+\n                                  |                              |\n                         +--------V--------+           +---------V---------+\n                         |                 |           |                   |\n                         |   metric-sink   |           |  graph-nodes-sink |\n                         |                 |           |                   |\n                         +-----------------+           +-------------------+\n\nThe Starting point for the application is the \n[Streams](node-finder/src/main/scala/com/expedia/www/haystack/service/graph/node/finder/app/Streams.scala) class, which \nbuilds the topology shown in the picture above. This `node-finder` topology consists of one source, three processors\nand two sinks. \n\n* Source: The topology contains a source called `proto-spans`. This source reads a Kafka topic with the same name. \nIt uses `SpanDeserializer` as the value deserializer to read incoming spans in the topic.\n\n* Processors:\n  * span-accumulator : This processor accumulates all the incoming spans in a PriorityQueue ordered by each Span's \n    timestamp to maintain the incoming order. Periodically, it traverses the priority queue to find spans with matching\n    span-ids and combines them to form a client-server span pair. These span pairs are then forwarded to the downstream \n    processors. Accumulation time is configurable with a configuration keyed by `accumulator.interval`. It has a minor \n    optimization built in during queue traversal to match recently arrived spans with spans in the next batch.\n  * latency-producer : The latency producer is one of the processors downstream of span-accumulator. This simple \n    processor produces a `MetricPoint` instance to record the network latency in the current span pair. \n    A sample JSON representation of the metric point will look like\n  ```json\n  {\n    \"metric\" : \"latency\",\n    \"type\" : \"gauge\",\n    \"value\" : 40.0,\n    \"epochTimeInSeconds\" : 1523637898,\n    \"tags\" : {\n      \"serviceName\" : \"foo-service\",\n      \"operationName\" :  \"bar-operation\"\n    }\n  }  \n  ```\n  * nodes-n-edges-producer: This processor is another simple processor that is downstream of span-accumulator. \n    For every span pair received, this processor emits a simple JSON representation of a graph edge as shown below\n  \n  ```json\n  {\n    \"source\" : \"foo-service\",\n    \"destination\" : \"baz-service\",\n    \"operation\" : \"bar-operation\"  \n  }\n  ```\n* Sinks:\n  * metric-sink: This sink is downstream of latency-producer. It serializes each MetricPoint instance with a \n    `Message Pack` serializer and writes the serialized output to a configured Kafka topic.\n  * graph-nodes-sink: This sink is downstream of nodes-n-edges-producer. It serializes the JSON as a string and writes \n    that string to a configured Kafka topic for the `graph-builder` component to consume and build a service dependency \n    graph.\n\n## Component: graph-builder\n\nThis component takes graph edges emitted by `node-finder` and merges them together to form the full service-graph. \nIt also has an http endpoint to return the accumulated service-graph. \n\n#### Streaming\n`graph-builder` accumulates incoming edges in \n[ktable](https://kafka.apache.org/0102/javadoc/org/apache/kafka/streams/kstream/KTable.html), using the stream \n[table duality concept](https://docs.confluent.io/current/streams/concepts.html#duality-of-streams-and-tables). \nEach row in the ktable represets one graph edge. Each edge is supplemented with some stats such as running count and \nlast seen timestamp. \n\nKafka does take care of persisting and replicating the graph ktable across brokers to have fault tolerance.  \n\n#### HTTP API\n`graph-builder` also acts as an http api to query the graph ktable, using servlets over embedded jetty for implementing \nthe endpoints. \n[Kafka interactive query](https://kafka.apache.org/10/documentation/streams/developer-guide/interactive-queries.html) \nis used for fetching service graphs from local.  \n\nAn interactive query to a single stream nodes return only the graph-edges sharded to that node, hence it is a partial \nview of the world. The servlet take care of fetching partial graphs from all nodes having the ktable to form full \nservice graphs.\n  \n######endpoints \n1. `/servicegraph` : returns full service graph, includes edges from all know services. Edges include operations also.  \n\n## Building\n\nTo build the components in this repository at once, one can run\n```\nmake all\n```\nTo build the components separately, once can check the README in the individual component folders.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fexpediadotcom%2Fhaystack-service-graph","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fexpediadotcom%2Fhaystack-service-graph","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fexpediadotcom%2Fhaystack-service-graph/lists"}