Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mrpowers/redshift_extractor
Using the Redshift UNLOAD/COPY commands to move data from one Redshift cluster/database to another
https://github.com/mrpowers/redshift_extractor
Last synced: 3 months ago
JSON representation
Using the Redshift UNLOAD/COPY commands to move data from one Redshift cluster/database to another
- Host: GitHub
- URL: https://github.com/mrpowers/redshift_extractor
- Owner: MrPowers
- License: mit
- Created: 2015-10-26T19:44:52.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2016-03-04T19:08:58.000Z (almost 9 years ago)
- Last Synced: 2024-09-23T16:47:39.936Z (3 months ago)
- Language: Ruby
- Size: 16.6 KB
- Stars: 4
- Watchers: 6
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# RedshiftExtractor
redshift_extractor moves data from one Amazon Redshift cluster to another. Here is how it works:
- Source database
1. [UNLOAD](http://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html) - runs a SELECT query and exports the results to CSV files in S3.
- Destination database
2. Drop - Drops a database table (the table in the destination database where the data will be stored).
3. Create - Creates a database table.
4. [COPY](http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html) - Loads data from S3 into a Redshift database.
One database connection is established with the source database to UNLOAD the data to S3. After the data is UNLOADed, a second database connection is establed with the destination database to drop/create the database table that will store the data. The final step is to COPY the data from the S3 files to the destination table.
## Running the Code
The `RedshiftExtractor::Extractor` class is instantiated with a long hash of arguments ([sorry Sandi Metz!](https://robots.thoughtbot.com/sandi-metz-rules-for-developers)).
```ruby
args = {
database_config_source: "database_config_source",
database_config_destination: "database_config_destination",
unload_s3_destination: "unload_s3_destination",
unload_select_sql: "unload_select_sql",
destination_schema: "destination_schema",
destination_table: "destination_table",
create_sql: "create_sql",
copy_data_source: "copy_data_source",
aws_access_key_id: "aws_access_key_id",
aws_secret_access_key: "aws_secret_access_key"
}extractor = RedshiftExtractor::Extractor.new(args)
extractor.run
```Here is a description of the parameters:
- database_config_source: A hash that's acceptable for the [Ruby Postgres gem](https://bitbucket.org/ged/ruby-pg/wiki/Home). Here's an example:
```ruby
{
dbname: "db_name",
user: "username",
password: "password",
host: "host",
sslmode: 'require',
port: 5439
}
```- unload_s3_destination: A S3 path, something like `"s3://bucket_name/something_else/"`
- unload_select_sql: A SQL SELECT query that will be run on the source table
- destination_schema, destination_table: The table that will be dropped, recreated, and populated with data from the COPY command
- create_sql: The SQL that creates the destination_schema.destination_table table (this SQL is run to recreate the table in the step above)
- copy_data_source: This is typically `"#{unload_s3_destination}manifest"`. The UNLOAD command automatically creates a manifest file that can be used by the COPY command to load the data.
- aws_keys: The keys you get from AWS.
## Installation
Add this line to your application's Gemfile:
```ruby
gem 'redshift_extractor'
```And then execute:
$ bundle
## Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/MrPowers/redshift_extractor.
## License
The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).