Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/scraly/flume-bigquery-sink

An Apache Flume Sink implementation to publish data to Google BigQuery
https://github.com/scraly/flume-bigquery-sink

bigquery flume sink

Last synced: about 1 month ago
JSON representation

An Apache Flume Sink implementation to publish data to Google BigQuery

Awesome Lists containing this project

README

        

# flume-bigquery-sink
An Apache Flume Sink implementation to publish data to Google BigQuery

## Configuration of Google BigQuery Sink:

>Edit log4j.xml





...



>Edit your flume.conf:

#list the sources, sinks and channels for the agent
agent.sources =
agent.sinks = bigquery-sink
agent.channels = bigquery-channel

# properties of
agent.sources.modv6-source.channels = bigquery-channel
agent.sources.modv6-source.type = avro
agent.sources.modv6-source.bind = localhost
agent.sources.modv6-source.port = 8090

# properties of bigquery-channel
agent.channels.bigquery-channel.type = file
agent.channels.bigquery-channel.checkpointDir = /data/flume-bq/checkpoint
agent.channels.bigquery-channel.dataDirs = /data/flume-bq/data
agent.channels.bigquery-channel.minimumRequiredSpace = 0

# properties of bigquery-sink
agent.sinks.bigquery-sink.channel =
agent.sinks.bigquery-sink.type = BigQuerySink
agent.sinks.bigquery-sink.batchSize = 100
agent.sinks.bigquery-sink.clientId = .apps.googleusercontent.com
agent.sinks.bigquery-sink.clientSecret =
agent.sinks.bigquery-sink.accessToken =
agent.sinks.bigquery-sink.refreshToken =
agent.sinks.bigquery-sink.dataStoreDir = /home//etc/
agent.sinks.bigquery-sink.userId =
agent.sinks.bigquery-sink.datasetId =
agent.sinks.bigquery-sink.projectId =

>Edit BigQueryManager class:

private static final String PROJECT_ID = "112233445566"; // change with your google cloud projectId
private static final String DATASET = "toto"; //change with your google bigquery dataset

>Change LogField and CSVUtil classes in order to tell to the BigQuery sink what is the bq table schema