Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/fujiwara/rin
Rin is a Redshift data Importer by SQS messaging.
https://github.com/fujiwara/rin
aws go golang redshift rin sqs
Last synced: 4 months ago
JSON representation
Rin is a Redshift data Importer by SQS messaging.
- Host: GitHub
- URL: https://github.com/fujiwara/rin
- Owner: fujiwara
- License: mit
- Created: 2015-04-21T05:55:05.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2023-01-18T05:59:10.000Z (about 2 years ago)
- Last Synced: 2024-10-11T21:12:42.363Z (4 months ago)
- Topics: aws, go, golang, redshift, rin, sqs
- Language: Go
- Homepage:
- Size: 149 KB
- Stars: 26
- Watchers: 3
- Forks: 6
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Rin
Rin is a Redshift data Importer by SQS messaging.
## Architecture
1. (Someone) creates a S3 object.
2. [S3 event notifications](https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html) will send to a message to SQS.
3. Rin will fetch messages from SQS, and publish a "COPY" query to Redshift.## Installation
### Binary packages
[Releases](https://github.com/fujiwara/Rin/releases)
### Homebrew
```console
$ brew install fujiwara/tap/rin
```### Docker
[GitHub Packages](https://github.com/users/fujiwara/packages/container/package/rin)
```console
$ docker pull ghcr.io/fujiwara/rin:v1.1.3
```## Configuration
[Configuring Amazon S3 Event Notifications](https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html).
1. Create SQS queue.
2. Attach SQS access policy to the queue. [Example Walkthrough 1:](https://docs.aws.amazon.com/AmazonS3/latest/dev/ways-to-add-notification-config-to-bucket.html)
3. [Enable Event Notifications](http://docs.aws.amazon.com/AmazonS3/latest/UG/SettingBucketNotifications.html) on a S3 bucket.
4. Run `rin` process with configuration for using the SQS and S3.### config.yaml
```yaml
queue_name: my_queue_name # SQS queue namecredentials:
aws_region: ap-northeast-1redshift:
host: localhost
port: 5439
dbname: test
user: test_user
password: '{{ must_env "REDSHIFT_PASSWORD" }}'
schema: public
reconnect_on_error: true # disconnect Redshift on error occurreds3:
bucket: test.bucket.test
region: ap-northeast-1sql_option: "JSON 'auto' GZIP" # COPY SQL option
# define import target mappings
targets:
- s3:
key_prefix: test/foo/ignore
discard: true # Do not import and do not try following targets. Matches only.- redshift:
table: foo
s3:
key_prefix: test/foo- redshift:
schema: xxx
table: bar
s3:
key_prefix: test/bar
break: true # Do not try following targets.- redshift:
schema: $1 # expand by key_regexp captured value.
table: $2
s3:
key_regexp: test/schema-([a-z]+)/table-([a-z]+)/- redshift:
host: redshift.example.com # override default section in this target
port: 5439
dbname: example
user: example_user
password: example_pass
schema: public
table: example
s3:
bucket: redshift.example.com
region: ap-northeast-1
key_prefix: logs/example/
sql_option: "CSV DELIMITER ',' ESCAPE"
```A configuration file is parsed by [kayac/go-config](https://github.com/kayac/go-config).
go-config expands environment variables using syntax `{{ env "FOO" }}` or `{{ must_env "FOO" }}` in a configuration file.
When the password for Redshift is empty, Rin will try call [GetClusterCredentials API](https://docs.aws.amazon.com/redshift/latest/APIReference/API_GetClusterCredentials.html) to get a temporary password for the cluster.
#### Credentials
Rin requires credentials for SQS and Redshift.
1. `credentials.aws_access_key_id` and `credentials.aws_secret_access_key`
- used for SQS and Redshift(COPY query and Data API access).
2. `credentials.aws_iam_role`
- used for Redshift COPY query only.
- for SQS and Redshift Data API, Rin will try to get a instance credentials.## Run
### daemon mode
Rin waits new SQS messages and processing it continually.
```
$ rin -config config.yaml [-debug]
````-config` also accepts HTTP/S3/File URL to specify the location of configuration file.
For example,```
$ rin -config s3://rin-config.my-bucket/config.yaml
```### batch mode
Rin process new SQS messages and exit.
```
$ rin -config config.yaml -batch [-debug]
```## Set max execution time
A CLI option `-max-execution-time` is set max execution time for running SQS worker and batch process.
## SQL Drivers
Rin has two ways to connect to Redshift.
### `postgres` driver
`postgres` driver is the default. Rin connects to Redshift with PostgreSQL protocol over TCP in the VPC network.
`host`, `port`, `user` and `password` fields are required in the `redshift` section.
```yaml
redshift:
driver: postgres # default
host: localhost
port: 5439
user: test_user
password: '{{ must_env "REDSHIFT_PASSWORD" }}'
```### `redshift-data` driver
`redshift-data` driver connects to Redshift via [Redshift Data API](https://docs.aws.amazon.com/redshift/latest/mgmt/data-api.html).
Redshift Data API does not require a VPC network.
With provisoned cluster, `driver`, `cluster` and `user` are required.
```yaml
redshift:
driver: redshift-data
cluster: your-cluster-name
user: test_user
```With Redshift serverless, `driver`, `workgroup` are required.
```yaml
redshift:
driver: redshift-data
workgroup: your-workgroup-name
```See also [github.com/mashiike/redshift-data-sql-driver](https://github.com/mashiike/redshift-data-sql-driver).