Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/datastax/snowflake-connector
Datastax Snowflake Sink Connector for Apache Pulsar
https://github.com/datastax/snowflake-connector
apache apache-pulsar connectors kafka kafka-connect pulsar pulsar-io snowflake
Last synced: 2 months ago
JSON representation
Datastax Snowflake Sink Connector for Apache Pulsar
- Host: GitHub
- URL: https://github.com/datastax/snowflake-connector
- Owner: datastax
- License: apache-2.0
- Created: 2021-09-21T20:08:45.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2024-06-17T12:38:23.000Z (7 months ago)
- Last Synced: 2024-07-20T11:32:23.168Z (6 months ago)
- Topics: apache, apache-pulsar, connectors, kafka, kafka-connect, pulsar, pulsar-io, snowflake
- Language: Java
- Homepage:
- Size: 166 KB
- Stars: 8
- Watchers: 9
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Datastax Snowflake Sink Connector for Apache Pulsar
This connector pushes messages from Apache Pulsar topic into Snowflake DB.
This connector is Open Source Software, Apache 2 licensed and built using [Snowflake's Connector for Kafka](https://github.com/snowflakedb/snowflake-kafka-connector) ([documentation](https://docs.snowflake.com/en/user-guide/kafka-connector.html)).
Please refer to this [blog post](https://medium.com/building-the-open-data-stack/datastax-presents-snowflake-sink-connector-for-apache-pulsar-53629b196064) for a quick walk-through.
## Installation
Please refer to Apache Pulsar IO [documentation](https://pulsar.apache.org/docs/en/io-use/).
Use `pulsar-snowflake-connector/target/pulsar-snowflake-connector-0.1.0-SNAPSHOT.nar` if the project is built from sources.
## Configuration
| Parameter Name | Description | Default value |
|-----------------|-------------|---------------|
|batchSize|Size of messages in bytes the sink will attempt to batch messages together before flush.|16384|
|lingerTimeMs|Time interval in milliseconds the sink will attempt to batch messages together before flush.|2147483647|
|topic|The topic name that passed to kafka sink.|n/a|
|kafkaConnectorSinkClass|A kafka-connector sink class to use.|com.snowflake.kafka.connector.SnowflakeSinkConnector|
|offsetStorageTopic|Pulsar topic to store offsets at.|snowflake-sink-offsets|
|unwrapKeyValueIfAvailable|In case of `Record>` data use key from KeyValue<> instead of one from Record.|true|
|kafkaConnectorConfigProperties|Config properties to pass to the kafka connector.|n/a|List of `kafkaConnectorConfigProperties` can be found at the [documentation](https://docs.snowflake.com/en/user-guide/kafka-connector-install.html#kafka-configuration-properties) for the Snowflake's Connector for Kafka.
### Known Issues
`snowflake.topic2table.map` parameter [is not supported](https://github.com/snowflakedb/snowflake-kafka-connector/issues/337).
Snowflake's Connector expects `topic:table[,topic:table]` format and does not handle Pulsar's topic URLs `persistent://tenant/namespace/topic-name`.
### Example
```yaml
processingGuarantees: "EFFECTIVELY_ONCE"
configs:
topic: "snowflake-demo"
offsetStorageTopic: "snowflake-sink-offsets-demo"
batchSize: "100"
lingerTimeMs: "600000"
kafkaConnectorConfigProperties:
name: "snowflakedemo"
connector.class: "com.snowflake.kafka.connector.SnowflakeSinkConnector"
tasks.max: "1"
topics: "snowflake-demo"
buffer.count.records: "100"
buffer.flush.time: "600"
buffer.size.bytes: "102400"
snowflake.url.name: "tenant.snowflakecomputing.com:443"
snowflake.user.name: "kafka_connector_user"
snowflake.private.key: "very_secret_key"
snowflake.database.name: "kafka_db"
snowflake.schema.name: "kafka_schema"
key.converter: "org.apache.kafka.connect.storage.StringConverter"
value.converter: "com.snowflake.kafka.connector.records.SnowflakeJsonConverter"
```## Building from source
If you want to develop and test this library you need to build it from sources.
```shell
mvn clean install -DskipTests
```The Apache Pulsar distro is regulated by the properties `pulsar.version` and `pulsar.distribution-name`.
### Release
```shell
mvn release:prepare -Prelease -Dresume=false
```The GitHub release is handled by a GitHub action whenever a tag is being pushed