https://github.com/cclient/spark-streaming-kafka-offset-mysql
mysql 维护 kafka offset,支持追踪并回滚到某个'异常'时间点,重新消费
https://github.com/cclient/spark-streaming-kafka-offset-mysql
mysql offset spark spark-streaming
Last synced: 10 months ago
JSON representation
mysql 维护 kafka offset,支持追踪并回滚到某个'异常'时间点,重新消费
- Host: GitHub
- URL: https://github.com/cclient/spark-streaming-kafka-offset-mysql
- Owner: cclient
- Created: 2019-03-02T06:26:16.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2019-03-13T12:37:50.000Z (over 7 years ago)
- Last Synced: 2025-06-06T18:03:08.303Z (about 1 year ago)
- Topics: mysql, offset, spark, spark-streaming
- Language: Scala
- Homepage: https://www.cnblogs.com/zihunqingxin/p/14476970.html
- Size: 6.84 KB
- Stars: 4
- Watchers: 2
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
### use Spark SQL to load/store offset with mysql
less complicated than custom implement sql operation
#### offset store
```sql
mysql> select * from kfk_offset where datetime>'2019-01-09' and topic='task-response' and `group`='extract';
+--------+------------+-------+------+-----------+-----------+-----------+-------+---------------------+
| id | topic | group | step | partition | from | until | count | datetime |
+--------+------------+-------+------+-----------+-----------+-----------+-------+---------------------+
| 1 | task-response | extract | 1 | 0 | 1959008 | 1995008 | 36000 | 2019-01-09 00:01:19 |
| 2 | task-response | extract | 1 | 1 | 1897546 | 1933546 | 36000 | 2019-01-09 00:01:19 |
| 0 | task-response | extract | 1 | 2 | 1876072 | 1912072 | 36000 | 2019-01-09 00:01:19 |
| 5 | task-response | extract | 2 | 0 | 1995008 | 2031008 | 36000 | 2019-01-09 00:05:05 |
| 7 | task-response | extract | 2 | 1 | 1933546 | 1969546 | 36000 | 2019-01-09 00:05:05 |
| 6 | task-response | extract | 2 | 2 | 1912072 | 1948072 | 36000 | 2019-01-09 00:05:05 |
```
For my scene(extract crawler's response dom/json),I need rollback to the 'problem datetime' and re-consumer records after it;
### rollback
1 kill the spark consume process
2 point the problem datetime,then delete sql record by datetime or step
`delete from kfk_offset where `step`>1 and `topic`='task-response' and `group`='extract'`
3 start the spark consume process
### Use
#### develop
copy source code
or
copy spark-streaming-kafka-offset-mysql_2.11-0.1.jar -> {project}/lib/
#### deploy
sbt package
copy spark-streaming-kafka-offset-mysql_2.11-0.1.jar -> $SPARK_HOME/jars/
### other
upload to maven repositories to use jar