{"id":34837295,"url":"https://github.com/arenadata/kafka-remote-log-metadata-manager","last_synced_at":"2026-05-22T22:35:49.193Z","repository":{"id":281208080,"uuid":"944535249","full_name":"arenadata/kafka-remote-log-metadata-manager","owner":"arenadata","description":"RemoteLogMetadataManager implementation for Apache Kafka","archived":false,"fork":false,"pushed_at":"2025-04-16T09:05:02.000Z","size":242,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-04-16T11:58:21.195Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/arenadata.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-03-07T14:16:45.000Z","updated_at":"2025-04-16T09:05:07.000Z","dependencies_parsed_at":null,"dependency_job_id":"f046fc1d-c9bc-4b4e-8610-c56c5866d3ea","html_url":"https://github.com/arenadata/kafka-remote-log-metadata-manager","commit_stats":null,"previous_names":["arenadata/kafka-remote-log-metadata-manager"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/arenadata/kafka-remote-log-metadata-manager","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arenadata%2Fkafka-remote-log-metadata-manager","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arenadata%2Fkafka-remote-log-metadata-manager/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arenadata%2Fkafka-remote-log-metadata-manager/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arenadata%2Fkafka-remote-log-metadata-manager/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/arenadata","download_url":"https://codeload.github.com/arenadata/kafka-remote-log-metadata-manager/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/arenadata%2Fkafka-remote-log-metadata-manager/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28032399,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-12-25T02:00:05.988Z","response_time":58,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-12-25T16:08:57.194Z","updated_at":"2025-12-25T16:10:36.160Z","avatar_url":"https://github.com/arenadata.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Remote Log Metadata Manager implementation\n\n## Design\n\n### Remote Log Metadata Manager\n\n`RemoteLogMetadataManager` provides ability to store and fetch remote log segment metadata with strongly consistent semantics.\nThere is an inbuilt implementation backed by topic storage in the local cluster. This implementation, on the other way, \nlets to store metadata in remote systems, abstracted by pluggable `MetadataStorageBackend` implementations.\n\nCurrent implementation handles log segment metadata depending on the partition it belongs to: \n\n- If current broker is the leader for this partition, then metadata manager atomically stores metadata and auxiliary information \nin memory along with saving it in an external system. Since in the entire cluster only one broker can be the leader \nfor a particular partition, fetching metadata of a partition or log segments from this partition also can be done from \nthe in-memory cache in strictly consistent way on current broker.\n- Otherwise, if current broker is the follower for this partition, then only read operations can be performed \nfor log segments from it. To fetch the particular log segment metadata by offset, at first the id of the remote \nlog segment is obtained from the topic partition metadata, that is fetched from the remote storage,\nand only then the log segment metadata itself is fetched from the remote storage.\n\n### Partition metadata\n\nAlong with each log segment metadata add/update, `RemoteLogMetadataManager` saves some auxiliary information, needed for \nmetadata retrieval by offset or leader epoch. This information is grouped by partitions, therefore, it was called partition metadata.\n\nThe structure of partition metadata:\n\n| Field       | Type                                      | Description                                                                                                                                                            |\n|-------------|-------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| deleteState | `RemotePartitionDeleteState`              | This fields indicates the deletion state of the remote topic partition. Can be one of DELETE_PARTITION_MARKED, DELETE_PARTITION_STARTED or DELETE_PARTITION_FINISHED.\t |\n| epochStates | `Map\u003cInteger, RemoteLogLeaderEpochState\u003e` | Metadata for each leader epoch, containing in log segments of this partition.                                                                                          |\n\nThe structure of leader epoch metadata:\n\n| Field                  | Type                            | Description                                                                                                                                                                                                                                       |\n|------------------------|---------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| offsetToId             | `Map\u003cLong, RemoteLogSegmentId\u003e` | Contains offset to segment ids mapping with the segment state as COPY_SEGMENT_FINISHED.\t                                                                                                                                                          |\n| unreferencedSegmentIds | `Set\u003cRemoteLogSegmentId\u003e`       | Unreferenced segments for this leader epoch. It contains the segments still in COPY_SEGMENT_STARTED and DELETE_SEGMENT_STARTED state or these have been replaced by callers with other segments having the same start offset for the leader epoch |\n| highestLogOffset       | `Long`                          | The highest log offset of the segments that reached the COPY_SEGMENT_FINISHED state.                                                                                                                                                              |\n\nWhen metadata manager adds or updates log segment metadata, it also atomically updates/creates metadata for partition leader epochs, \ncontained in this segment metadata.\n\n### Storage backends\n\nRemote metadata storage is abstracted in the `MetadataStorageBackend` interface. \nThis allows to select the storage backend with configuration. `MetadataStorageBackend` is a binary key-value storage with following operations:\n- Create value by key;\n- Update value by key;\n- Get value by key;\n- Get values by key like provided one;\n- Remove values by key like provided one;\n- Atomically execute values creations/updates by keys. \n\n### Metadata key factory\n\nEach remote metadata storage backend should also provide implementation of `MetadataKeyFactory`, that creates string keys for:\n\n- Remote log segments metadata;\n- Partition metadata;\n- Leader epoch metadata;\n- Collective key of all segments metadata from particular partition.\n\n### Remote storage connector\n\nRemote storage connector is implementation of `MetadataStorageConnector`, that simply creates `MetadataStorageBackend` and `MetadataKeyFactory`.\n\n## Configuration\n\nKafka can be configured using the following options to enable the current `RemoteLogMetadataManager` implementation:\n\n| Name                                            | Type   | Default value                                                                 | Description                                                                                                                                                                                                |\n|-------------------------------------------------|--------|-------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| `remote.log.metadata.manager.class.name`        | String | org.apache.kafka.server.log.remote.storage.RemoteLogMetadataManager           | Fully qualified class name of `RemoteLogMetadataManager` implementation. To enable current implementation it should be set to 'io.arenadata.kafka.tieredstorage.metadata.storage.RemoteLogMetadataManager' |\n| `rlmm.config.metadata.storage.connector.class`  | String | -                                                                             | Fully qualified class name of `MetadataStorageConnector` implementation.                                                                                                                                   |\n| `rlmm.config.metadata.storage.serializer.class` | String | io.arenadata.kafka.tieredstorage.metadata.storage.serde.DefaultMetadataMapper | Fully qualified class name of `MetadataSerDe` implementation, remote log/partition metadata serializer.                                                                                                    |\n\n### Local demo\n\nTo run the Kafka instance with Zookeeper RemoteLogMetadataManager and HDFS storage backend locally, follow these steps:\n1. Run `make docker_image` to build the docker image of Kafka with the necessary dependencies.\n2. Follow the instruction from [demo/README.md](demo/README.md)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farenadata%2Fkafka-remote-log-metadata-manager","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Farenadata%2Fkafka-remote-log-metadata-manager","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farenadata%2Fkafka-remote-log-metadata-manager/lists"}