https://github.com/databendcloud/bend-ingest-kafka
Ingest kafka data into databend
https://github.com/databendcloud/bend-ingest-kafka
Last synced: 5 months ago
JSON representation
Ingest kafka data into databend
- Host: GitHub
- URL: https://github.com/databendcloud/bend-ingest-kafka
- Owner: databendcloud
- License: apache-2.0
- Created: 2023-03-03T07:23:31.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2024-05-15T07:04:51.000Z (about 2 years ago)
- Last Synced: 2024-05-16T12:52:53.888Z (about 2 years ago)
- Language: Go
- Size: 52.7 KB
- Stars: 3
- Watchers: 6
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# bend-ingest-kafka
Ingest kafka data into databend
# Installation
```shell
go install github.com/databendcloud/bend-ingest-kafka@latest
```
Or download the binary from the [release page](https://github.com/databendcloud/bend-ingest-kafka/releases).
```go
bend-ingest-kafka --version
```
# Usage
## Json transform mode
The json transform mode is the default mode which will transform the kafka data into databend table, you can use it by setting the `--is-json-transform` to `true`.
### Create a table according your kafka data structrue
For example, the kafka data like
```json
{"i64": 10,"u64": 30,"f64": 20,"s": "hao","s2": "hello","a16":[1],"a8":[2],"d": "2011-03-06","t": "2016-04-04 11:30:00"}
```
you should create a table using
``` SQL
CREATE TABLE test_ingest (
i64 Int64,
u64 UInt64,
f64 Float64,
s String,
s2 String,
a16 Array(Int16),
a8 Array(UInt8),
d Date,
t DateTime);
```
### execute bend-ingest-kafka
#### command line mode
```shell
bend-ingest-kafka
--kafka-bootstrap-servers="127.0.0.1:9092,127.0.0.2:9092"\
--kafka-topic="Your Topic"\
--kafka-consumer-group= "Consumer Group"\
--databend-dsn="databend://user:password@localhost:8000/default?sslmode=disable"\
--databend-table="db1.tbl" \
--data-format="json" \
--batch-size=100000 \
--batch-max-interval=300
```
#### config file mode
Config the config file `config/conf.json`
```json
{
"kafkaBootstrapServers": "localhost:9092",
"kafkaTopic": "ingest_test",
"KafkaConsumerGroup": "test",
"isJsonTransform": true,
"databendDSN": "databend://user:password@localhost:8000/default?sslmode=disable",
"databendTable": "default.kfk_test",
"batchSize": 1,
"batchMaxInterval": 5,
"dataFormat": "json",
"workers": 1
}
```
and execute the command
```shell
./bend-ingest-kafka
```
## Raw mode
The raw mode is used to ingest the raw data into databend table, you can use it by setting the `isJsonTransform` to `false`.
In this mode, we will create a table with the name `databendTable` which columns are `(uuid, koffset,kpartition, raw_data, record_metadata, add_time)` and ingest the raw data into this table.
The `record_metadata` is the metadata of the kafka record which contains the `topic`, `partition`, `offset`, `create_time`, `key`, and the `add_time` is the time when the record is added into databend.
### Example
If the kafka json data is:
```json
{"i64": 10,"u64": 30,"f64": 20,"s": "hao","s2": "hello","a16":[1],"a8":[2],"d": "2011-03-06","t": "2016-04-04 11:30:00"}
```
run the command
```shell
./bend-ingest-kafka
```
with `config/conf.json` and the table `default.kfk_test` will be created and the data will be ingested into this table.

## Parameter References
| Parameter | Description | Default | example |
|-----------------------|---------------------------|-------------------|---------------------------------|
| kafkaBootstrapServers | kafka bootstrap servers | "127.0.0.1:64103" | "127.0.0.1:9092,127.0.0.2:9092" |
| kafkaTopic | kafka topic | "test" | "test" |
| KafkaConsumerGroup | kafka consumer group | "kafka-bend-ingest" | "test" |
|isSASL | is sasl | false | true |
|saslUser | sasl user | "" | "user" |
|saslPassword | sasl password | "" | "password" |
| mockData | mock data | "" | "" |
| isJsonTransform | is json transform | true | true |
| databendDSN | databend dsn | no | "databend://user:password@localhost:8000/default?sslmode=disable" |
| databendTable | databend table | no | "db1.tbl" |
| batchSize | batch size | 1000 | 1000 |
| batchMaxInterval | batch max interval (seconds) | 30 | 30 |
| dataFormat | data format | json | "json" |
| workers | workers thread number | 1 | 1 |
| copyPurge | copy purge | false | false |
| copyForce | copy force | false | false |
| DisableVariantCheck | disable variant check | false | false |
| MinBytes | min bytes | 1024 | 1024 |
| MaxBytes | max bytes | 1048576 | 1048576 |
| MaxWait | max wait time (seconds) | 10 | 10 |
| useReplaceMode | use replace mode | false | false |
| userStage | user external stage name | ~ | ~ |
| maxRetryDelay | max retry delay (seconds) | 1800 | 1800 |
**NOTE:**
- The `copyPurge and copyForce` are used to delete the data in the target table before ingesting the data. More details please refer to [copy](https://docs.databend.com/sql/sql-commands/dml/dml-copy-into-table#copy-options).
- The `useReplaceMode` is used to replace the data in the table, if the data already exists in the table, the new data will replace the old data. But the `useReplaceMode` is only supported when `isJsonTransform` false because it needs to add `koffset` and `kpartition` field in the target table.