Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/anpandu/ps2bq
Stream insert GCP PubSub messages into BigQuery table.
https://github.com/anpandu/ps2bq
bigquery golang pubsub
Last synced: about 1 month ago
JSON representation
Stream insert GCP PubSub messages into BigQuery table.
- Host: GitHub
- URL: https://github.com/anpandu/ps2bq
- Owner: anpandu
- License: mit
- Created: 2020-02-25T16:50:49.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2020-03-09T15:28:52.000Z (almost 5 years ago)
- Last Synced: 2023-08-19T04:41:35.340Z (over 1 year ago)
- Topics: bigquery, golang, pubsub
- Language: Go
- Homepage:
- Size: 22.5 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ps2bq · [![GitHub license](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/anpandu/ps2bq/blob/master/LICENSE)
## Introduction
PS2BQ is a CLI tool for importing messages from GCP PubSub into BigQuery table.
## Usage
### Run from local
```sh
# clone repo
git clone https://github.com/anpandu/ps2bq# build binary
go mod download
go install# run it
export GOOGLE_APPLICATION_CREDENTIALS=~/google-key.json # make sure credential file is set
$GOPATH/bin/ps2bq run --help# EXAMPLE
ps2bq run
--project=myproject
--dataset=mydataset
--table=students
--topic=t-students
--subscription-id=ps2bq-students-20200101
--worker=4
--message-buffer=1
```### Run as Docker container
```sh
# TBD
```### Configs
```
-D, --dataset string BigQuery Dataset
-n, --message-buffer int Number of message to be inserted (default 1)
-P, --project string Google Cloud Platform Project ID
--schema string BigQuery JSON table schema file location (default "/tmp/schema.json")
-s, --subscription-id string PubSub Subscription ID
-T, --table string BigQuery Table
-t, --topic string PubSub Topic
-w, --worker int Number of workers (default 4)
```PubSub Messages will each inserted as a new row.
PubSub Messages received should be JSON Object.
Message example: `{"id":123,"name":"Alice"}`
JSON Schema must be provided in order to create table.
See: https://cloud.google.com/bigquery/docs/schemas#creating_a_json_schema_file
Message containing invalid JSON and/or invalid according to table schema will be failed to be inserted.## Roadmap
| Status | Description |
|:-------:|:----------- |
| ✔ | 1 worker, 1 message inserted |
| ✔ | N worker, 1 message inserted each |
| ✔ | N worker, N message inserted each (buffered) |
| ✘ | Every t seconds, insert all messages in buffer |
| ✘ | Dockerfile |
| ✘ | Validate message using JSON schema |
| ✘ | Create table with partition |
| ✘ | Auto-generate subscription ID |
| ✘ | go doc |
| ✘ | Kubernetes YAMLs |
| ✘ | Multiple sink (?) |
| ✘ | Multiple source (?) |## License
MIT © [Ananta Pandu]([email protected])