{"id":19530771,"url":"https://github.com/nodefluent/zamza","last_synced_at":"2025-06-30T10:33:58.169Z","repository":{"id":44843825,"uuid":"152960165","full_name":"nodefluent/zamza","owner":"nodefluent","description":"Apache Kafka discovery, indexing, searches, storage and hooks :beetle:","archived":false,"fork":false,"pushed_at":"2022-12-09T14:55:33.000Z","size":830,"stargazers_count":10,"open_issues_count":12,"forks_count":1,"subscribers_count":9,"default_branch":"master","last_synced_at":"2024-09-16T00:09:04.239Z","etag":null,"topics":["api","browse","find","hooks","http","index","info","json","kafka","mongodb","rest","seek","stats","topic","ui"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nodefluent.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-10-14T09:53:38.000Z","updated_at":"2024-02-23T07:16:59.000Z","dependencies_parsed_at":"2023-01-25T14:15:14.748Z","dependency_job_id":null,"html_url":"https://github.com/nodefluent/zamza","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nodefluent%2Fzamza","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nodefluent%2Fzamza/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nodefluent%2Fzamza/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nodefluent%2Fzamza/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nodefluent","download_url":"https://codeload.github.com/nodefluent/zamza/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223906307,"owners_count":17223046,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api","browse","find","hooks","http","index","info","json","kafka","mongodb","rest","seek","stats","topic","ui"],"created_at":"2024-11-11T01:36:11.888Z","updated_at":"2024-11-11T01:36:13.371Z","avatar_url":"https://github.com/nodefluent.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# zamza\n\n```text\n   .             Hi, I am (Gregor) Z(S)amz(s)a.                                                             \n     /                                                                         \n      (                     %      @                         \u0026                 \n       .                    /    ,@                       @                    \n        (                  @    @                      @                       \n         \u0026               @    @                 *@@#                           \n          @            @(   @        /@@@@\u0026                                    \n           *         *@@*  @@@@@@@  /@@@@@ .                                   \n             (  @,      @@@#@@\u0026@@@\u0026/@(  @@@\u0026*@#  \u0026@\u0026@@@@@@@@@@@@*@,          \n              @ */@#@*@@#  @@@@@@@@@@@@@@@@ ##,  / ,*@@ @@%,@@@@@@@@@*         \n              @@ @\u0026%@%@@@@@@@@@@@@@@@%@@@@@@@/@  @@@@@@@@@@@@@@@@@@@@@@        \n        .*%@@@\u0026%@@@@#/@#%@@@@@@@@\u0026@@@@/@@@, @@@@@@@@@@@@@@@@@@@@@@@@@@@        \n              @*@@@@@@@@@@@@@@@@@@@@@@@(@@@@@@@@@@@@@%*.,@@@@@@@@@@@\u0026,         \n                @@@.\u0026@@@\u0026@@@@@@@@@@(@@@/* @@@@@%,*/@@@@@@@@@@@@@@@@@           \n                   /@@@#@@@@@@@@@@@@@@@ .@@@@@@@@@@@,@@@@@/(       @           \n                          (.@@@@@@ (#@,@@@@@@*\u0026@@\u0026\u0026                            \n                                     @@\u0026@@/ \u0026    \n```\n\n## What\n\nApache Kafka discovery, indexing, searches, storage, hooks and HTTP gateway.\n\n## How\n\nYou can manage topics to be indexed on the fly via `/api/topic-config` and \nzamza will keep track of anything that goes through your Apache Kafka cluster.\nIt will index and store messages in MongoDB, according to the topic's `delete` or `compact` configuration. You can find and retrieve messages in milliseconds via `/api/fetch`. Zamza also allows you to register HTTP hooks to subscribe to topics, as well as HTTP calls to produce to Kafka topics.\nHooks also allow performant retries as well as replays of whole Kafka topics for specific (hook-) subscribers.\n\n## Why\n\nThere are some tools out there that help developers to browse Kafka topics,\nlike `kafka-rest-ui` or `Java Kafka Tool`, however they provide a very poor experience to their users, as they either require an annoying secret management, when working with production clusters (SASL-SSL) or are just very unresponsive and slow, because they are built upon a backend that spins up kafka consumers, which fetch with timeouts.\n\nWith zamza we simply accept the fact that spinning up a consumer and searching for a given message (seeking) might be the streaming way, but its a slow and frustrating thing to do, especially if you have a lot of developers doing that at the same time. Zamza brings millisecond message for key lookups by indexing topic windows in MongoDB.\n\n_Disclaimer: Depending on the size of your Kafka cluster (and topics..) you will need experience in scaling MongoDB._\n\n## Requirements\n\n* node.js \u003e= 9.x.x (we suggest \u003e= 10.11.x)\n* Apache Kafka \u003e= 0.11.x (we suggest \u003e= 1.x.x)\n* MongoDB \u003e= 3.2 (we suggest \u003e= 4.x)\n\n## Install\n\nAs simple as `yarn global add zamza`.\n\n(_NOTE: In case you dont have yarn run `npm i -g yarn` first._)\n\n## Run\n\n`zamza \"./baseConfig.js\"`\n\nYou just have to throw in a config (JSON or JS).\n[A base config is always used](bin/baseConfig.js), so you just have to overwrite\nyour specific requirements.\n\nCheck out `zamza -h` for other options.\n\n## Using\n\nWith any HTTP client.\n\nCheckout the API quick start or the setup infos below.\n\n**There will be a UI for zamza very soon! Stay frosty.**\n\n## API Quick Start\n\n### Getting general info about zamza and your Kafka clusters\n\nYou can get an overview of possible API actions by calling `GET /api/info`.\n\n### Configuring Kafka topics for zamza\n\nBy default zamza, will connect and fetch some information about your Kafka cluster. \nBut it wont start building indices and metadata yet. You will have to configure topics so that they will be consumed\nand processed.\n\nYou can do that by providing a small configuraton in a `POST /api/config/topic` call.\nYou will have to provide the following information in the body `{ topic, cleanupPolicy, retentionMs?, queryable? }`.\n\n* `topic` is the name of the Kafka topic (you can fetch available topics via `GET /api/info/topics/available`)\n* `cleanupPolicy` is the clean up policy of your topic (similiarly configured in the MongoDB backend of zamza)\nallowed values are: `compact` (runs upserts based on you topics keys), `delete` (uses retentionMs to determine a ttl for messages),\n`none` (inserts only, no ttl) or `compact_and_delete` (runs upserts with ttl on retentionMs combined)\n* `retentionMs` ttl for the given messages in milliseconds (either the messages have a timestamp or the time of insertion will be used to determine when to delete a message), messages are deleted via MongoDB's document delete index\n* `queryable` determines if the message values of a topic should be stored as JSON (object structure) or as Buffer (byte array),\ndefault is `false` which will store it as Buffer, increasing performance; however running queries with the query API will not be possible on such topics\n\nIf you configure a topic, it will be consumed from earliest and stored to MongoDB depending on the given configuration by zamza.\nTopics require about 1/2 the storage size in MongoDB (enable compression) that they require in Kafka.\n\nIt is also possible to deactivate persisting messages in zamza (see config file) and run in `hookOnly` mode.\nIt is also possible to deactivate hooks and run consumption via pagination only (see config file).\n\nPlease note that any changes on the topic-config resource will take a few seconds to be polled and applied to all zamza instances,\njust like hook changes.\n\n### Fetching a message for a certain key\n\nIf your token has access to a topic, you can find messages for keys very fast by calling `GET /:topic/find/key/:key`.\n\n### Get a bulk of earliest messages for a topic\n\nVery simply by calling `GET /api/fetch/:topic/range/earliest/:count`.\n\n### Get a bulk of latest messages for a topic\n\nVery simply by calling `GET /api/fetch/:topic/range/latest/:count`.\n\n### Producing through zamza\n\nIt is very easy to produce Kafka messages to different topics via zamza.\nJust make sure your token has the `__produce` value and the topic exists prior to producing to it.\nAs zamza is using auto-discovery to identify the partition count of a topic, it requires the topic to exist\nbeforehand and cannot create it on the fly.\n\nThen simply make HTTP `POST /api/produce/` requests.\n\n### Paginating through topics through zamza (without Hooks)\n\nIt is possible to paginate through topics very efficiently, by using the fetch endpoints.\nMake sure your token has access rights to the specific topic and that the topic is configured\nin zamza.\n\nThen simply make HTTP `GET /:topic/paginate/stoe/:skipToIndex/:limit` requests.\n\nPlease note that this can be a bit tricky to understand at first, but to paginate efficiently with a MongoDB backend\nzamza requires the last ObjectID. The correct way to start is therefore the following:\n\n1. fetch `GET /:topic/paginate/stoe/null/100` to get the first 100 messages from `earliest` of a given topic\n2. response will look like `{ results: [ {$index: \"123\", .. }, ..] }`\n3. now for the next fetch take the $index of the last message e.g. `const lastIndex = response.results[response.results.length - 1].$index` and fetch again `GET /:topic/paginate/stoe/${lastIndex}/100` to get the next 100 messages\n\nIf you want to fetch from latest to earliest just use the other endpoint `GET /:topic/paginate/etos/null/100` schema is the same.\nPlease note that for zamza to provide you with the full topic data it will take some time to process the full Kafka topic after\nits configuration first, before you can start to paginate through it.\n\n### Fetching the JSON schema of a topic\n\nIf you have configured topics that are sticking to a certain JSON schema for their value payload and, which\nare (as they should be..) produced with full updates of the entities only, you can use the `GET /api/info/schema/:topic/json`\nendpoint to fetch the schema.\n\n### Fetching the detailed JSON schema of a topic (based on a single message)\n\nWhen fetching JSON schemas on `GET /api/info/schema/:topic/json` zamza will consolidate the schema of the topic based on a few messages\nfrom earliest and from latest. If there types do not fit 100% intermediate types e.g. object or array might be result in the json schema.\nIf you are certain that a topic ships a constant schema you can use `GET /api/info/single-schema/:topic/json` to let zamza build the\nschema based on a single latest message from the given topic.\n\n### Running advanced queries to find or count messages\n\nZamza enables you to search for messages in topics based on their payload fields.\nYou can pass a query object to `POST /api/query/:topic/filter` \nbody: `{ query: { \"value.payload.customer.firstName\": \"Peter\", \"value.payload.customer.surName\": \"Pan\" } }`.\n\nMessage results will look equal to the other API message responses e.g. pagination.\nWhen querying messages you can provide additional parameters to customize your query: \n\n* `limit` to limit the results, default is null (unlimited) (you can omit this field) (will limit scanned documents not collected)\n* `skipToIndex` works exactly like described in the pagination API above, default is null (you can omit this field)\n* `order` order is applied when skipToIndex is used, value can be 1 or -1 (default is -1)\n* `async` boolean if the query should run separated from the http request, in case your query takes longer than your\ne.g. http timeout, default is false. In case you pass true here, your request will resolve very fast and will return a\nbody `{ cacheKey: 123123123123 }`, using this cacheKey you can fetch the results from: `GET /api/query/results/:cacheKey`\nwhen the query is ready. You can also use `GET /api/query/queries` to get an overview of running queries (on that instance of zamza). Using `DELETE /api/query/abort/:cacheKey` you can stop a query (across all instances of zamza).\n\nAnother look at the collection you are querying, for convenience:\n\n```javascript\nconst collectionSchema = {\n  key: Number, // hashed\n  timestamp: Number,\n  partition: Number,\n  offset: Number,\n  keyValue: Buffer,\n  value: Mixed,\n  deleteAt: Date,\n  fromStream: Boolean,\n  storedAt: Number,\n};\n```\n\nPlease **NOTE**: your searches are will be made on fields that wont have any `indices`,\ntherefore these queries might take long (depending on the size of your topics and their configuration).\nAlso making a lot of them at the same time, will result in high loads on your MongoDB (cluster).\n\n#### Counting messages in a topic based on a filter\n\nUsing the endpoint `POST /api/fetch/:topic/query/count` (body equals the described find endpoint, however only `query` field can be actually used),\nyou can retreive the count of messages spread across topics. Resulting API response will be `{ count: number }`.\n\n### Using Hooks\n\nFirst of all you will have to enable hooks in the config `config.hooks.enabled`.\nAfterwards please make sure to create the following topics with key compaction or deletion (to your liking)\nso that zamza can automatically deal with retrys and replays: `__zamza_retry_topic, __zamza_replay_topic`.\n(Please do no never produce to these topics manually, they should be kept exclusive for zamza).\n\n* Each hook can subscribe to multiple topics\n* If the hook http call times out (config file) or does not respond with status code 200 zamza will\nproduce a message to its internal zamza-retry topic after a short timeout (config file) and will run a\nretry shortly after up to max retry attempts are reached (config file).\n* Hooks can be removed or subscriptions changed any time, as well as topic configuration, if these are missing\nwhen a retry or replay is processed, zamza will simply skip the retried or replayed messages\n* please note that hook `name`s should be unique\n\n* to check all configured hooks call `GET /api/config/hook`\n* to create a hook call `POST /api/config/hook` with body `{name, subscriptions: [ { topic, ignoreReplay?, disabled? } ], endpoint}`\nyou can provide additional fields `authorizationHeader, authorizationValue, disabled` for your hook.\n* you can update a configured hook by calling `PUT /api/config/hook`\n* you can remove a hook by calling `DELETE /api/config/hook/name/:name`\n\n**Please note** that any changes on the hook resource will take a few seconds to be polled and applied to all zamza instances,\njust like topic-config changes.\n\n**Also note** that the hook endpoint should always return status code `200` in case of a successfull processing.\nHowever if you return other status codes or the hook requests fails due to network errors (depending on your config)\nthe hook will be retried after a few seconds.\n\n**There is an additional used status code** for hook responses `205`. In this case zamza will try to parse the following\nresponse body `{topic, partition, key, value}`. Passing a correct response body and status code 205 will result in\nan additional produced message by zamza to the provided topic. The idea of this concept should allow you to subscribe \nservices to a given topic, transfer the messages on received hook calls and pipe them to another (or equal) topic\nwhile providing the hook response.\n\n### Doing topic replays for hook subscriptions\n\nFirst of all ensure that you have configured a hook subscription that has no set `ignoreReplay` to `true`.\nOtherwise you wont receive messages from a replay. Additionally your token will require `__replay` access\nrights.\n\nReplays work on a per instance basis, there can only be one replay per topic be active at the same time.\nAnd there can only be one replay per instance at the same time.\n\nReplays dont stop at some point, please ensure your Kafka consumer config is set to earliest (deafault value).\nWhen you have reached a state of sufficient lag on the replay consumer, you can stop the the replay process.\n\nIf you are running you have to make sure to call the same instance, with start and stop orders for replays,\ncalling the wrong instance will result in 400 responses.\n\nIn case of restarts or crashes during (no clean SIG kills) = bad shutdowns for zamza, there are endpoints to restore\nand flush the replay state. It is also possible to provide a certain consumer group again, to continue at a certain offset\non a topic, if not provided a random consumer group will be generated for each replay.\n\nReplays work by spawning a small and fast mirror that pipes the target topic onto the internal zamza-replay Kafka topic\nand running through all hook subcriptions that are not ignoring replays (config).\n\n* To get an overview of all replays across instances call `GET /api/config/replays`.\n* To check if the current instance you are calling is running a replay call `GET /api/config/replay`\n* To start a new replay on an instance call `POST /api/config/replay` with the body `{ topic, consumerGroup? }`\n* To check the current lag status on an instance call `GET /api/config/replay/lag`\n* To stop a replay process call `DELETE /api/config/replay/:topic`\n* In case you cannot start new replays due to bad shutdowns call `DELETE /api/config/replay/flushone` on the instance\n\n### Reading metadata information\n\nZamza collects all kind of metadata while processing your clusters messages, depending on the topic-configuration that you have\nprovded of course. You can find these `GET `endpoints by calling `GET /api/info/`.\nHowever fetching of additional metadata information can be enabled by using the shared state API `POST /api/state/ {key: string, val: string}`.\nTopic (partition) metadata polling e.g. by setting `enable_metadata_job` to `true`. Please *NOTE* that metadata processing runs as job every x minutes\n(depending on your configuration) the queries produce a hefty read load on your MongoDB cluster (ensure to read first from slaves).\n\n## Setup Info\n\n### Deployment \u0026 Scaling\n\nzamza as process is limited to a single CPU core and does not require too much memory (depending on Kafka consumer configuration) as well. About 0.1 CPU and 350 MB RAM with default configuration can be sufficient. However the more hooks and replays your instances are processing the faster they will max out on CPU as well as require more memory, up to 1.2 GB.\n\nBut **zamza is build to scale horizontally**, just spin up a few containers (see Dockerfile example) and scale up. In production environments we suggest to have as many instances are the largest amount of partitions configured for a Kafka topic in your cluster.\n\n### Metrics\n\nYou can monitor zamza via Prometheus at `http://localhost:1912/metrics`.\n\n### Access Management\n\nZamza allows _fine grained_ access management with a similiar span of what Kafka ACLs allow you to do on a per topic basis.\nYou define tokens as keys in the configs http access object and set the topic names or special rights as string members of the key's array value. A wildcard `*` grants all rights.\n\ne.g.\n\n```javascript\nconst config = {\n  http: {\n    access: {\n      \"my-crazy-secure-token-string\": [ \"__delete\", \"__produce\", \"__hook\", \"__topic\", \"on-some-topic\" ],\n      \"token-for-some-topics\": [ \"customers\", \"baskets\", \"orders\" ],\n      \"token-for-admin\": [ \"*\" ]\n    }\n  }\n};\n```\n\nWhen making calls to zamza's HTTP API the token is provided in the `authorization` header.\n\n* `*` Allows every operation\n* `__topic` Is allowed to configure topics (only for provided topics)\n* `__hook` Is allowed to create hooks (only for provided topics)\n* `__delete` Allows deletes on topics (if no wildcard is present, only on the provided topics)\n* `__produce` Allows producing messages to topic (that are provided additionally)\n* `__replay` Is allowed to configure topic replays\n\nBe aware that the default configuration is a wildcard for everything. (Meaning no token is required).\nNever expose Zamza's HTTP interface publicly.\n\n### Config via Environment Variables\n\nIt is possible to set a few config parameters (most in role of secrets) via environment variables.\nThey will always overwrite the passed configuration file.\n\n* `MONGODB_URL=\"mongodb://localhost:27017\"` -\u003e turns into: `config.mongo.url = \"mongodb://localhost:27017\";`\n* `MONGODB_USERNAME=admin` -\u003e turns into: `config.mongo.options.user = \"admin\";`\n* `MONGODB_PASSWORD=admin` -\u003e turns into: `config.mongo.options.pass = \"admin\";`\n* `MONGODB_DBNAME=zamza_prod` -\u003e turns into: `config.mongo.options.dbName = \"zamza_prod\";`\n* `KAFKA_BROKER_LIST=kafka-1:9093,kafka-2:9093` -\u003e turns into: `config.kafka.consumer.noptions[\"metadata.broker.list\"] = \"kafka-1:9093\";`\n* `KAFKA_SSL_PASSPHRASE=\"123456\"` -\u003e turns into: `config.kafka.consumer.noptions[\"ssl.key.password\"] = \"123456\";`\n* `KAFKA_SASL_USERNAME=\"123456\"` -\u003e turns into: `config.kafka.consumer.noptions[\"sasl.username\"] = \"123456\";`\n* `KAFKA_SASL_PASSWORD=\"123456\"` -\u003e turns into: `config.kafka.consumer.noptions[\"sasl.password\"] = \"123456\";`\n* `ACL_DEFINITIONS=\"mytoken=topic1,topic2;othertoken=topic3\" zamza -l \"./config.json\"` -\u003e turns into: `config.http.access.mytoken = [ \"topic1\", \"topic2\" ];`\n\nThe kafka env values will set consumer and producer at the same time.\n\n## FAQ\n\n### I am getting errors for `.` or `$` in my kafka message payloads\n\nMongoDB (actually BSON) does not like object keys that contain `.` or `$` or are `null.`\nYou can ask zamza to marshall your messages to replace `.` or `$` with `_` before storing them\nby enabling `config.marshallForInvalidCharacters = true` (default is `false`). Please **Note**: however that\nthis will increase zamza's CPU usage, a lot.\n\n## Maintainer\n\nChristian Fröhlingsdorf [@chrisfroeh](https://twitter.com/chrisfroeh)\n\nBuild with :heart: :pizza: and :coffee: by [nodefluent](https://github.com/nodefluent)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnodefluent%2Fzamza","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnodefluent%2Fzamza","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnodefluent%2Fzamza/lists"}