https://github.com/cclient/spark-streaming-kafka-offset-mysql

mysql 维护 kafka offset,支持追踪并回滚到某个'异常'时间点，重新消费
https://github.com/cclient/spark-streaming-kafka-offset-mysql

mysql offset spark spark-streaming

Last synced: 11 months ago
JSON representation

mysql 维护 kafka offset,支持追踪并回滚到某个'异常'时间点，重新消费

Host: GitHub
URL: https://github.com/cclient/spark-streaming-kafka-offset-mysql
Owner: cclient
Created: 2019-03-02T06:26:16.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2019-03-13T12:37:50.000Z (over 7 years ago)
Last Synced: 2025-06-06T18:03:08.303Z (about 1 year ago)
Topics: mysql, offset, spark, spark-streaming
Language: Scala
Homepage: https://www.cnblogs.com/zihunqingxin/p/14476970.html
Size: 6.84 KB
Stars: 4
Watchers: 2
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          ### use Spark SQL to load/store offset with mysql

less complicated than custom implement sql operation

#### offset store

```sql

mysql> select * from kfk_offset where datetime>'2019-01-09' and topic='task-response' and `group`='extract';

+--------+------------+-------+------+-----------+-----------+-----------+-------+---------------------+

| id     | topic      | group | step | partition | from      | until     | count | datetime            |

+--------+------------+-------+------+-----------+-----------+-----------+-------+---------------------+

| 1 | task-response | extract   |  1 |         0 | 1959008 | 1995008 | 36000 | 2019-01-09 00:01:19 |

| 2 | task-response | extract   |  1 |         1 | 1897546 | 1933546 | 36000 | 2019-01-09 00:01:19 |

| 0 | task-response | extract   |  1 |         2 | 1876072 | 1912072 | 36000 | 2019-01-09 00:01:19 |

| 5 | task-response | extract   |  2 |         0 | 1995008 | 2031008 | 36000 | 2019-01-09 00:05:05 |

| 7 | task-response | extract   |  2 |         1 | 1933546 | 1969546 | 36000 | 2019-01-09 00:05:05 |

| 6 | task-response | extract   |  2 |         2 | 1912072 | 1948072 | 36000 | 2019-01-09 00:05:05 |

```

For my scene(extract crawler's response dom/json),I need rollback to the 'problem datetime' and re-consumer records after it;

### rollback

 

1 kill the spark consume process

2 point the problem datetime,then delete sql record by datetime or step 

`delete from kfk_offset where `step`>1 and `topic`='task-response' and `group`='extract'`

3 start the spark consume process

### Use 

#### develop 

copy source code

or

copy spark-streaming-kafka-offset-mysql_2.11-0.1.jar -> {project}/lib/

#### deploy

sbt package

copy spark-streaming-kafka-offset-mysql_2.11-0.1.jar -> $SPARK_HOME/jars/

### other

upload to maven repositories to use jar

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cclient/spark-streaming-kafka-offset-mysql

Awesome Lists containing this project

README