An open API service indexing awesome lists of open source software.

https://github.com/robinpowered/kafka-connect-kinesis

Kafka Connect connector for Amazon Kinesis
https://github.com/robinpowered/kafka-connect-kinesis

Last synced: 9 months ago
JSON representation

Kafka Connect connector for Amazon Kinesis

Awesome Lists containing this project

README

          

# Introduction

This connector is for reading data from [Amazon Kinesis](https://aws.amazon.com/kinesis/) and writing the data to Apache Kafka.

# Configuration

## KinesisSourceConnector

```properties
name=connector1
tasks.max=1
connector.class=com.github.jcustenborder.kafka.connect.kinesis.KinesisSourceConnector

# Set these required values
aws.secret.key.id=
aws.access.key.id=
kafka.topic=
kinesis.stream=
```

| Name | Description | Type | Default | Valid Values | Importance |
|----------------------------------------|--------------------------------------------------------------------------------------------------------------------------|----------|--------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|
| aws.access.key.id | aws.access.key.id | string | | | high |
| aws.secret.key.id | aws.secret.key.id | password | | | high |
| kafka.topic | The kafka topic to write the data to. | string | | | high |
| kinesis.stream | The Kinesis stream to read from. | string | | | high |
| kinesis.shard.id | The shard of the Kinesis stream to read from. This is a regex which can be used to read all of the shards in the stream. | string | .* | | high |
| kinesis.empty.records.backoff.ms | The number of milliseconds to backoff when the stream is empty. | long | 5000 | [500,...,2147483647] | medium |
| kinesis.position | The position in the stream to reset to if no offsets are stored. | string | TRIM_HORIZON | ValidEnum{enum=ShardIteratorType, allowed=[AT_SEQUENCE_NUMBER, AFTER_SEQUENCE_NUMBER, TRIM_HORIZON, LATEST, AT_TIMESTAMP]} | medium |
| kinesis.record.limit | The number of records to read in each poll of the Kinesis shard. | int | 500 | [1,...,10000] | medium |
| kinesis.region | The AWS region for the Kinesis stream. | string | US_EAST_1 | ValidEnum{enum=Regions, allowed=[GovCloud, US_EAST_1, US_EAST_2, US_WEST_1, US_WEST_2, EU_WEST_1, EU_WEST_2, EU_CENTRAL_1, AP_SOUTH_1, AP_SOUTHEAST_1, AP_SOUTHEAST_2, AP_NORTHEAST_1, AP_NORTHEAST_2, SA_EAST_1, CN_NORTH_1, CA_CENTRAL_1]} | medium |
| kinesis.throughput.exceeded.backoff.ms | The number of milliseconds to backoff when a throughput exceeded exception is thrown. | long | 10000 | [500,...,2147483647] | medium |

# Data

## com.github.jcustenborder.kafka.connect.kinesis.KinesisKey

A partition key is used to group data by shard within a stream.

| Name | Optional | Schema | Default Value | Documentation |
|--------------|----------|-------------------------------------------------------------------------------------------------------|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| partitionKey | true | [String](https://kafka.apache.org/0102/javadoc/org/apache/kafka/connect/data/Schema.Type.html#STRING) | | A partition key is used to group data by shard within a stream. The Streams service segregates the data records belonging to a stream into multiple shards, using the partition key associated with each data record to determine which shard a given data record belongs to. Partition keys are Unicode strings with a maximum length limit of 256 bytes. An MD5 hash function is used to map partition keys to 128-bit integer values and to map associated data records to shards. A partition key is specified by the applications putting the data into a stream. Identifies which shard in the stream the data record is assigned to. See [Record.getPartitionKey()](http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/kinesis/model/Record.html#getPartitionKey--) |

## com.github.jcustenborder.kafka.connect.kinesis.KinesisValue

The unit of data of the Amazon Kinesis stream, which is composed of a sequence number, a partition key, and a data blob. See [Record](http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/kinesis/model/Record.html)

| Name | Optional | Schema | Default Value | Documentation |
|-----------------------------|----------|-------------------------------------------------------------------------------------------------------|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| sequenceNumber | true | [String](https://kafka.apache.org/0102/javadoc/org/apache/kafka/connect/data/Schema.Type.html#STRING) | | The unique identifier of the record in the stream. See [Record.getSequenceNumber()](http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/kinesis/model/Record.html#getSequenceNumber--) |
| approximateArrivalTimestamp | true | [Timestamp](https://kafka.apache.org/0102/javadoc/org/apache/kafka/connect/data/Timestamp.html) | | The approximate time that the record was inserted into the stream. See [Record.getApproximateArrivalTimestamp()](http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/kinesis/model/Record.html#getApproximateArrivalTimestamp--) |
| data | true | [Bytes](https://kafka.apache.org/0102/javadoc/org/apache/kafka/connect/data/Schema.Type.html#BYTES) | | The data blob. See [Record.getData()](http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/kinesis/model/Record.html#getData--) |
| partitionKey | true | [String](https://kafka.apache.org/0102/javadoc/org/apache/kafka/connect/data/Schema.Type.html#STRING) | | A partition key is used to group data by shard within a stream. The Streams service segregates the data records belonging to a stream into multiple shards, using the partition key associated with each data record to determine which shard a given data record belongs to. Partition keys are Unicode strings with a maximum length limit of 256 bytes. An MD5 hash function is used to map partition keys to 128-bit integer values and to map associated data records to shards. A partition key is specified by the applications putting the data into a stream. Identifies which shard in the stream the data record is assigned to. See [Record.getPartitionKey()](http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/kinesis/model/Record.html#getPartitionKey--) |
| shardId | true | [String](https://kafka.apache.org/0102/javadoc/org/apache/kafka/connect/data/Schema.Type.html#STRING) | | A shard is a uniquely identified group of data records in a stream. A stream is composed of one or more shards, each of which provides a fixed unit of capacity. Each shard can support up to 5 transactions per second for reads, up to a maximum total data read rate of 2 MB per second and up to 1,000 records per second for writes, up to a maximum total data write rate of 1 MB per second (including partition keys). The data capacity of your stream is a function of the number of shards that you specify for the stream. The total capacity of the stream is the sum of the capacities of its shards. |
| streamName | true | [String](https://kafka.apache.org/0102/javadoc/org/apache/kafka/connect/data/Schema.Type.html#STRING) | | The name of the Kinesis stream. |