{"id":23483765,"url":"https://github.com/codesmell/camelkafkaoffset","last_synced_at":"2025-04-14T00:17:45.814Z","repository":{"id":203188370,"uuid":"709032385","full_name":"CodeSmell/CamelKafkaOffset","owner":"CodeSmell","description":"Trying to recreate an intermittent issue w/ Camel and handling of the Kafka offset (Camel-20044)","archived":false,"fork":false,"pushed_at":"2023-11-09T18:48:58.000Z","size":65,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-16T09:20:02.352Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CodeSmell.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-10-23T21:55:16.000Z","updated_at":"2023-10-24T15:40:06.000Z","dependencies_parsed_at":null,"dependency_job_id":"1550fb5b-5a33-4f53-87d6-c5d9d314d98a","html_url":"https://github.com/CodeSmell/CamelKafkaOffset","commit_stats":null,"previous_names":["codesmell/camelkafkaoffset"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CodeSmell%2FCamelKafkaOffset","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CodeSmell%2FCamelKafkaOffset/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CodeSmell%2FCamelKafkaOffset/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CodeSmell%2FCamelKafkaOffset/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CodeSmell","download_url":"https://codeload.github.com/CodeSmell/CamelKafkaOffset/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248799961,"owners_count":21163404,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-24T21:16:10.184Z","updated_at":"2025-04-14T00:17:45.785Z","avatar_url":"https://github.com/CodeSmell.png","language":"Java","readme":"# The Offset in a partition is being reset incorrectly\n\nWe were seeing lag in our application and figured it was do to low partitions and high volumes of data. But upon further investigation we found that the consumer was \"moving backwards\" and re-processing a series of messages on a partition. \n\nWhen a message has an exception, the `breakOnFirstError` will cause the partition/offset to be replayed again after the consumer is removed and readded to the consumer group. The 2nd time it occurs, it is typically followed by a seek to -1. This allows the consumer to move forward through the rest of the messages in that partition. However, it appears that sometimes the offset will be set to the value from another partition. That value can then result in the consumer reading from the partition where the error occurred in the wrong place. \n\nWhat makes this particular observation tough to catch and debug is that it appears to be intermittent. \n\nUnder the `src/logs` are 3 text files representing the logs for 3 separate runs of the provided test. Two of these logs capture the scenario where the offset is reset incorrectly. One of the logs captures what is expected to occur. The bottom of this README annotates the issue from the logs.\n\nThis was submitted with the following issue for Apache Camel\n\n- [CAMEL-20044](https://issues.apache.org/jira/browse/CAMEL-20044)\n\nI did find some issues in Apache Camel that seem to be similar\n\n- [CAMEL-14935](https://issues.apache.org/jira/browse/CAMEL-14935)\n- [CAMEL-18350](https://issues.apache.org/jira/browse/CAMEL-18350)\n- [CAMEL-19894](https://issues.apache.org/jira/browse/CAMEL-19894)\n\n## The Environment\nThe issue was first observed with the consumer running:\n\n- Rocky Linux 8.7\n- Open JDK 11.0.8\n- Camel 3.21.0\n- Spring Boot 2.7.14\n- Strimzi Kafka 0.28.0/3.0.0\n\n## The Kafka configuration\nThe basic Kafka settings:\n\n- autoCommitEnable = false\n- allowManualCommit = true\n- autoOffsetReset = earliest\n- maxPollRecords = 1\n- breakOnFirstError = true\n- consumerCount = 3 (same as partitions on the topic)\n\n## The Route \nThe Camel route is consuming messages from the topic. It will perform basic validation and then do a series of inserts and updates into the database. Note, the provided test omits any calls to an actual database.\n\nThe errors that are occur in the application are evaluated and categorized as either retryable or non-retryable. An example of the former might be a failure to connect to the database. An example of the latter might be missing data in the database. \n\nWhen a retryable problem is encountered, the exception is thrown and is unhandled in order to force Camel to roll back any database activity that has already been performed. The Kafka offset is not committed. This allows the message to be re-consumed and processed once the problem is corrected. This test omits any errors that would be retryable.\n\nWhen a non-retryable problem is encountered, the exception is also thrown and is left unhandled. This is to force Camel to roll back any database activity that has already been performed. In this case the Kafka offset is committed so that the message is not seen again. \n\n## Using breakOnFirstError and the subsequent behavior \n\nFrom the Camel docs (`breakOnFirstError`)\n\n\u003e This option controls what happens when a consumer is processing an exchange and it fails. If the option is false then the consumer continues to the next message and processes it. If the option is true then the consumer breaks out, and will seek back to offset of the message that caused a failure, and then re-attempt to process this message. However this can lead to endless processing of the same message if its bound to fail every time, eg a poison message. Therefore it is recommended to deal with that for example by using Camel’s error handler.\n\nIn the sample application we are publishing 13 messages. Each is a simple number or the text \"NORETRY-ERROR\". Each number is only published to the topic one time. The \"NORETRY-ERROR\" is published twice.\n\nBased on the way we have the route written and the expected behavior in Camel we would expect the following to occur when a non-retryable error occurs:\n\n- consume the message at the partition:offset (2:3)\n- throw an exception that is unhandled\n- commit the offset manually (2:3)\n- the unhandled exception is handled by Camel (rollback any DB activity) \n- the `KafkaRecordProcessor` will log the following: _Will seek consumer to offset 2 and start polling again._\n\nthen:\n\n- consume the message at the partition:offset (2:3) for a second time\n- throw an exception that is unhandled\n- commit the offset manually (2:3)\n- the unhandled exception is handled by Camel (rollback any DB activity) \n- the `KafkaRecordProcessor` will log the following: _Will seek consumer to offset -1 and start polling again._\n\nthen:\n\n- consume the message at the partition:offset (2:4)\n\nAt the end of the rest run, this is the expected result based on what was published (not necessarily in this order).\n\n| Payload Body         | Times Processed  | \n|----------------------|------------------|\n| NORETRY-ERROR        | 4 times \n| 1\t\t\t           | 1 times\n| 2\t\t\t\t       | 1 times \n| 3\t\t\t           | 1 times\n| 4\t\t\t           | 1 times\n| 5\t\t\t           | 1 times\n| 6\t\t\t           | 1 times\n| 7\t\t\t           | 1 times\n| 8\t\t\t           | 1 times\n| 9\t\t\t           | 1 times\n| 10\t\t\t           | 1 times\n| 11\t\t\t           | 1 times\n\nIf the test is re-run several times, it will (eventually) have an issue where the offset is set incorrectly.\nOne of the attached logs has the following (as it started to replay messages)\n\n| Payload Body         | Times Processed  | \n|----------------------|------------------|\n| NORETRY-ERROR        | 4 times \n| 3  \t\t           | 2 times\n| 11\t\t\t\t       | 1 times \n| 1\t\t\t           | 1 times\n| 2\t\t\t           | 1 times\n| 4\t\t\t           | 1 times\n| 5\t\t\t           | 1 times\n| 6\t\t\t           | 1 times\n| 7\t\t\t           | 1 times\n| 8\t\t\t           | 1 times\n| 10\t\t\t           | 1 times\n\nHere is a high-level annotation of the logs for this run\n\nThe NORETRY-ERROR was written to partition 0 with an offset of 1.\nIt was consumed:\n\n```\n2023-10-24 | 09:52:19.405 | INFO  | [Camel (camel-1) thread #2 - KafkaConsumer[foobarTopic]] | codesmell.test.CamelKafkaOffsetTest (CamelKafkaOffsetTest.java:147) | Message consumed from Kafka\nMessage consumed from foobarTopic\nThe Partion:Offset is 0:1\nThe Key is null\nNORETRY-ERROR\n```\n\nWhen it resulted in an exception the offset was committed.\n\n```\n2023-10-24 | 09:52:19.510 | INFO  | [Camel (camel-1) thread #2 - KafkaConsumer[foobarTopic]] | c.c.k.KafkaOffsetManagerProcessor (KafkaOffsetManagerProcessor.java:49) | manually committing the offset for batch\nMessage consumed from foobarTopic\nThe Partion:Offset is 0:1\nThe Key is null\nNORETRY-ERROR\n```\n\nThen the unhandled exception was handed over to Camel\n\n```\n2023-10-24 | 09:52:19.530 | ERROR | [Camel (camel-1) thread #2 - KafkaConsumer[foobarTopic]] | o.a.c.p.e.DefaultErrorHandler (CamelLogger.java:205) | Failed delivery for (MessageId: 6561DE1EA878C39-0000000000000000 on ExchangeId: 6561DE1EA878C39-0000000000000000). Exhausted after delivery attempt: 1 caught: codesmell.exception.NonRetryException: NON RETRY ERROR TRIGGERED BY TEST. Processed by failure processor: FatalFallbackErrorHandler[null]\n```\n\nIt then seeks to offset 0 (which is correct) and removes itself from the consumer group\n\n```\n2023-10-24 | 09:52:19.531 | WARN  | [Camel (camel-1) thread #2 - KafkaConsumer[foobarTopic]] | o.a.c.c.k.c.s.KafkaRecordProcessor (KafkaRecordProcessor.java:132) | Will seek consumer to offset 0 and start polling again.\n2023-10-24 | 09:52:19.537 | INFO  | [Camel (camel-1) thread #2 - KafkaConsumer[foobarTopic]] | o.a.k.c.c.i.ConsumerCoordinator (ConsumerCoordinator.java:311) | [Consumer clientId=consumer-test_group_id-1, groupId=test_group_id] Revoke previously assigned partitions foobarTopic-0\n```\n\nWhen it rejoins partition 1 is set to offset 5\n\n```\n2023-10-24 | 09:52:22.205 | INFO  | [Camel (camel-1) thread #3 - KafkaConsumer[foobarTopic]] | o.a.k.c.c.i.ConsumerCoordinator (ConsumerCoordinator.java:851) | [Consumer clientId=consumer-test_group_id-3, groupId=test_group_id] Setting offset for partition foobarTopic-1 to the committed offset FetchPosition{offset=5, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[localhost:65036 (id: 0 rack: null)], epoch=0}}\n```\n\nPartition 0 is set to offset 1\n\n```\n2023-10-24 | 09:52:22.205 | INFO  | [Camel (camel-1) thread #1 - KafkaConsumer[foobarTopic]] | o.a.k.c.c.i.ConsumerCoordinator (ConsumerCoordinator.java:851) | [Consumer clientId=consumer-test_group_id-2, groupId=test_group_id] Setting offset for partition foobarTopic-0 to the committed offset FetchPosition{offset=1, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[localhost:65036 (id: 0 rack: null)], epoch=0}}\n```\n\nPartition 2 is set to offset 4\n\n```\n2023-10-24 | 09:52:22.205 | INFO  | [Camel (camel-1) thread #2 - KafkaConsumer[foobarTopic]] | o.a.k.c.c.i.ConsumerCoordinator (ConsumerCoordinator.java:851) | [Consumer clientId=consumer-test_group_id-4, groupId=test_group_id] Setting offset for partition foobarTopic-2 to the committed offset FetchPosition{offset=4, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[localhost:65036 (id: 0 rack: null)], epoch=0}}\n```\n\nThe message at partition 0 and offset 1 is reconsumed based on the behavior for `breakOnFirstError`\n\n```\n2023-10-24 | 09:52:22.691 | INFO  | [Camel (camel-1) thread #1 - KafkaConsumer[foobarTopic]] | codesmell.test.CamelKafkaOffsetTest (CamelKafkaOffsetTest.java:147) | Message consumed from Kafka\nMessage consumed from foobarTopic\nThe Partion:Offset is 0:1\nThe Key is null\nNORETRY-ERROR\n```\n\nThe message offset is committed and Camel gets the unhandled exception\n\n```\n2023-10-24 | 09:52:22.697 | INFO  | [Camel (camel-1) thread #1 - KafkaConsumer[foobarTopic]] | c.c.k.KafkaOffsetManagerProcessor (KafkaOffsetManagerProcessor.java:49) | manually committing the offset for batch\nMessage consumed from foobarTopic\nThe Partion:Offset is 0:1\nThe Key is null\nNORETRY-ERROR\n2023-10-24 | 09:52:22.714 | ERROR | [Camel (camel-1) thread #1 - KafkaConsumer[foobarTopic]] | o.a.c.p.e.DefaultErrorHandler (CamelLogger.java:205) | Failed delivery for (MessageId: 6561DE1EA878C39-0000000000000001 on ExchangeId: 6561DE1EA878C39-0000000000000001). Exhausted after delivery attempt: 1 caught: codesmell.exception.NonRetryException: NON RETRY ERROR TRIGGERED BY TEST. Processed by failure processor: FatalFallbackErrorHandler[null]\n```\n\nThis time when the consumer is removed from the consumer group the seek should use -1 so that it moves forward. This is the \"normal\" behavior for Camel when this occurs (though it seems like a cleaner and simpler design to honor the commit and not replay the message IMO). However, it seeks to offset 4 instead. This seems to be the offset assigned to partition 2. Observations suggest it is always grabbing the current offset from another partition. \n\n```\n2023-10-24 | 09:52:22.715 | WARN  | [Camel (camel-1) thread #1 - KafkaConsumer[foobarTopic]] | o.a.c.c.k.c.s.KafkaRecordProcessor (KafkaRecordProcessor.java:132) | Will seek consumer to offset 4 and start polling again.\n2023-10-24 | 09:52:22.720 | INFO  | [Camel (camel-1) thread #1 - KafkaConsumer[foobarTopic]] | o.a.k.c.c.i.ConsumerCoordinator (ConsumerCoordinator.java:311) | [Consumer clientId=consumer-test_group_id-2, groupId=test_group_id] Revoke previously assigned partitions foobarTopic-0\n```\n\nWhen it rejoins it now starts to set the offsets and ends up getting an out of range.\n\nParition 1 is set to offset 5\n\n```\n2023-10-24 | 09:52:25.238 | INFO  | [Camel (camel-1) thread #2 - KafkaConsumer[foobarTopic]] | o.a.k.c.c.i.ConsumerCoordinator (ConsumerCoordinator.java:851) | [Consumer clientId=consumer-test_group_id-4, groupId=test_group_id] Setting offset for partition foobarTopic-1 to the committed offset FetchPosition{offset=5, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[localhost:65036 (id: 0 rack: null)], epoch=0}}\n```\n\nPartition 0 is set to offset 5. This should have been set to 2.\n\n```\n2023-10-24 | 09:52:25.238 | INFO  | [Camel (camel-1) thread #3 - KafkaConsumer[foobarTopic]] | o.a.k.c.c.i.ConsumerCoordinator (ConsumerCoordinator.java:851) | [Consumer clientId=consumer-test_group_id-3, groupId=test_group_id] Setting offset for partition foobarTopic-0 to the committed offset FetchPosition{offset=5, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[localhost:65036 (id: 0 rack: null)], epoch=0}}\n```\n\nPartition 2 is set to offset 4\n\n```\n2023-10-24 | 09:52:25.238 | INFO  | [Camel (camel-1) thread #1 - KafkaConsumer[foobarTopic]] | o.a.k.c.c.i.ConsumerCoordinator (ConsumerCoordinator.java:851) | [Consumer clientId=consumer-test_group_id-5, groupId=test_group_id] Setting offset for partition foobarTopic-2 to the committed offset FetchPosition{offset=4, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[localhost:65036 (id: 0 rack: null)], epoch=0}}\n```\n\nThis is where things get weird. Since partition 0 does not have an offset 5 it gets an out of range error.\n\n```\n2023-10-24 | 09:52:25.261 | INFO  | [Camel (camel-1) thread #3 - KafkaConsumer[foobarTopic]] | o.a.k.c.consumer.internals.Fetcher (Fetcher.java:1413) | [Consumer clientId=consumer-test_group_id-3, groupId=test_group_id] Fetch position FetchPosition{offset=5, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[localhost:65036 (id: 0 rack: null)], epoch=0}} is out of range for partition foobarTopic-0, resetting offset\n```\n\nThis then sets the offset to earliest (offset 0) and starts to replay the messages. \n\n```\n2023-10-24 | 09:52:25.264 | INFO  | [Camel (camel-1) thread #3 - KafkaConsumer[foobarTopic]] | o.a.k.c.c.i.SubscriptionState (SubscriptionState.java:398) | [Consumer clientId=consumer-test_group_id-3, groupId=test_group_id] Resetting offset for partition foobarTopic-0 to position FetchPosition{offset=0, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[localhost:65036 (id: 0 rack: null)], epoch=0}}.\n2023-10-24 | 09:52:25.267 | INFO  | [Camel (camel-1) thread #3 - KafkaConsumer[foobarTopic]] | codesmell.test.CamelKafkaOffsetTest (CamelKafkaOffsetTest.java:147) | Message consumed from Kafka\nMessage consumed from foobarTopic\nThe Partion:Offset is 0:0\nThe Key is null\n3\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodesmell%2Fcamelkafkaoffset","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcodesmell%2Fcamelkafkaoffset","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodesmell%2Fcamelkafkaoffset/lists"}