{"id":20337315,"url":"https://github.com/expediadotcom/haystack-pipes","last_synced_at":"2025-04-11T22:42:03.993Z","repository":{"id":41243259,"uuid":"100393508","full_name":"ExpediaDotCom/haystack-pipes","owner":"ExpediaDotCom","description":"Packages to send (\"pipe\") Haystack data to external sinks (like AWS Firehose)","archived":false,"fork":false,"pushed_at":"2022-11-16T01:21:49.000Z","size":1548,"stargazers_count":2,"open_issues_count":11,"forks_count":2,"subscribers_count":16,"default_branch":"master","last_synced_at":"2025-03-25T18:45:09.885Z","etag":null,"topics":["hactoberfest","hactoberfest2020"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ExpediaDotCom.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-08-15T15:47:52.000Z","updated_at":"2020-10-25T13:10:52.000Z","dependencies_parsed_at":"2022-09-10T19:00:33.950Z","dependency_job_id":null,"html_url":"https://github.com/ExpediaDotCom/haystack-pipes","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ExpediaDotCom%2Fhaystack-pipes","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ExpediaDotCom%2Fhaystack-pipes/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ExpediaDotCom%2Fhaystack-pipes/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ExpediaDotCom%2Fhaystack-pipes/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ExpediaDotCom","download_url":"https://codeload.github.com/ExpediaDotCom/haystack-pipes/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248493022,"owners_count":21113159,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hactoberfest","hactoberfest2020"],"created_at":"2024-11-14T21:08:36.633Z","updated_at":"2025-04-11T22:42:03.968Z","avatar_url":"https://github.com/ExpediaDotCom.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Coverage Status](https://coveralls.io/repos/github/ExpediaDotCom/haystack-pipes/badge.svg?branch=master)](https://coveralls.io/github/ExpediaDotCom/haystack-pipes?branch=master)\n[![Build Status](https://travis-ci.org/ExpediaDotCom/haystack-pipes.svg?branch=master)](https://travis-ci.org/ExpediaDotCom/haystack-pipes)\n\n# haystack-pipes\nPackages to send (\"pipe\") Haystack data to external sinks (like AWS Firehose or another Kafka queue)\n![High Level Block Diagram](https://github.com/ExpediaDotCom/haystack-pipes/blob/master/documents/diagrams/haystack_pipes.png)\n\nThe haystack-pipes unit delivers a human-friendly version of Haystack messages to zero or more \"durable\" locations for \nmore permanent storage. Current \"plug`in\" implementations are:\n1. [kafka-producer](https://github.com/ExpediaDotCom/haystack-pipes/tree/master/kafka-producer): this package uses Kafka \nStreams to read the protobuf records from Kafka, transform them to JSON, and write them to another Kafka, potentially\nand typically a different Kafka installation than the one from which the protobuf records were read. The kafka-producer\npackage uses the \n[Kafka Producer API](https://kafka.apache.org/0110/javadoc/index.html?org/apache/kafka/clients/producer/Producer.html) \nto write to Kafka.\n2. [firehose-writer](https://github.com/ExpediaDotCom/haystack-pipes/tree/master/firehose-writer): this package uses\nKafka Streams to read the protobuf records from Kafka, transform them to JSON, and write them to the\n[Amazon Kinesis Data Firehose](https://aws.amazon.com/kinesis/data-firehose/) (an AWS service that facilitates loading \nstreaming data into AWS). Note that its \n[PutRecordBatch API](http://docs.aws.amazon.com/firehose/latest/APIReference/API_PutRecordBatch.html) accepts up to\n500 records, with a maximum size of 4 MB for each put request; firehose-writer will batch the records appropriately.\nKinesis Firehose can be configured to deliver the data to other AWS services that facilitate data analysis, like\n[Amazon S3](https://aws.amazon.com/s3/), [Amazon Redshift](https://aws.amazon.com/redshift/), and\n[Amazon Elasticsearch Service](https://aws.amazon.com/elasticsearch-service/).\n3. [json-transformer](https://github.com/ExpediaDotCom/haystack-pipes/tree/master/json-transformer): this package is a\nuses [Kafka Streams](https://kafka.apache.org/documentation/streams/) to read the protobuf records from Kafka, transform\nthem to JSON, and write them to another topic in Kafka.\n4. [http-poster](https://github.com/ExpediaDotCom/haystack-pipes/tree/master/http-poster): this package uses Kafka \nStreams to read the protobuf records from Kafka, transform them to JSON, and send them to another service, via an\n[HTTP POST](https://en.wikipedia.org/wiki/POST_(HTTP)) request.\n5. [secret-detector](https://github.com/ExpediaDotCom/haystack-pipes/tree/master/secret-detector): this package uses\nKafka Streams to read the protobuf records from Kafka and search the tags of those protobuf records (the records are\n\"Span\" objects from the [haystack-idl package](https://github.com/ExpediaDotCom/haystack-idl)) for \"personal\" data.\nThis personal data is either [PCI](https://en.wikipedia.org/wiki/Payment_card_industry) data (credit card numbers) or \n[PII](https://en.wikipedia.org/wiki/Personally_identifiable_information) data (address, phone number, etc.).\nWhich kind of personal data to search for is under configuration control. This secret-detector uses the open source\n[chlorine-finder](https://github.com/dataApps/chlorine-finder) package for detection.\nWhen a secret is found, information identifying the secret (but not the secret itself), is written back to Kafka.\nTo minimize the frequency of false positives (data thought to be secret that isn't really secret), a text file of\nwhitelisted tags is stored in S3. The format of this text file is one or more lines of\n`\u003cfinder name\u003e;\u003cservice name\u003e;\u003coperation name\u003e;\u003ctag name\u003e\\n`, that is, semi-colon delimited \"four-ples\" of fields\nfrom the Span, where a \"four-ples\" is separated from the next \"four-ple\" by a new line. Configurations controls where \nthis text file is found in S3 (i.e. in what bucket and under what key).\n\nIn all of the cases above, \"transform to JSON\" implies \"tag flattening\": the \n[OpenTracing API](https://github.com/opentracing/specification/blob/master/semantic_conventions.md) specifies tags in a \nsomewhat unfriendly format. For example, the following open tracing tags:\n```\n\"tags\":[{\"key\":\"strKey\",\"vStr\":\"tagValue\"},\n        {\"key\":\"longKey\",\"vLong\":\"987654321\"},\n        {\"key\":\"doubleKey\",\"vDouble\":9876.54321},\n        {\"key\":\"boolKey\",\"vBool\":true},\n        {\"key\":\"bytesKey\",\"vBytes\":\"AAEC/f7/\"}]\n```\nwill be converted to\n```\n\"tags\":{\"strKey\":\"tagValue\",\n        \"longKey\":987654321,\n        \"doubleKey\":9876.54321,\n        \"boolKey\":true,\n        \"bytesKey\":\"AAEC/f7/\"}}\n```\nby code in the Pipes [commons](https://github.com/ExpediaDotCom/haystack-pipes/tree/master/commons) module. The commons\nmodule also contains other shared code that:\n1. reads Kafka configurations,\n2. facilitates creating and starting Kafka Streams,\n3. serializes Spans,\n4. provides shared constants to unit tests,\n5. changes environment variables to lower case for consumption by [cfg4j](http://www.cfg4j.org/) \n(haystack-pipes uses cfg4j to read configuration files),\n6. Starts polling for the Counters and Timers provided by \n[haystack-metrics](https://github.com/ExpediaDotCom/haystack-metrics).\n\n## Building\n\n#### Cloning\n##### From scratch\nSince this repo contains haystack-idl as a submodule, a recursive clone of the\n[haystack-pipes package](https://github.com/ExpediaDotCom/haystack-pipes) is required:\n\n```git clone --recursive git@github.com:ExpediaDotCom/haystack-pipes.git .```\n\n##### From existing directory\nIf you have already cloned the the [haystack-pipes package](https://github.com/ExpediaDotCom/haystack-pipes) (perhaps\nwith an IDE that did not clone recursively as the command above instructs), or if you want to pick up a newer version of\nthe [haystack-idl package](https://github.com/ExpediaDotCom/haystack-idl), run the following from your haystack-pipes\ndirectory:\n\n```git submodule update --init --recursive```\n\n#### Prerequisites: \n\n* Java 1.8\n* Maven 3.3.9 or higher\n* Docker 1.13 or higher\n\n#### Build\n\n##### Full build\nFor a full build, including unit tests, run (from the directory to where you cloned haystack-pipes):\n\n```\nmake all\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fexpediadotcom%2Fhaystack-pipes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fexpediadotcom%2Fhaystack-pipes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fexpediadotcom%2Fhaystack-pipes/lists"}