{"id":14067138,"url":"https://github.com/guyernest/RedshiftRDBA","last_synced_at":"2025-07-30T00:32:26.123Z","repository":{"id":12221278,"uuid":"14829570","full_name":"guyernest/RedshiftRDBA","owner":"guyernest","description":"DBA library for Amazon Redshift","archived":false,"fork":false,"pushed_at":"2014-05-05T11:12:17.000Z","size":600,"stargazers_count":18,"open_issues_count":0,"forks_count":0,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-12-04T07:36:40.225Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"nightscout/cgm-remote-monitor","license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/guyernest.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2013-11-30T22:36:17.000Z","updated_at":"2024-07-25T06:11:48.000Z","dependencies_parsed_at":"2022-09-24T13:13:39.277Z","dependency_job_id":null,"html_url":"https://github.com/guyernest/RedshiftRDBA","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/guyernest/RedshiftRDBA","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guyernest%2FRedshiftRDBA","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guyernest%2FRedshiftRDBA/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guyernest%2FRedshiftRDBA/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guyernest%2FRedshiftRDBA/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/guyernest","download_url":"https://codeload.github.com/guyernest/RedshiftRDBA/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guyernest%2FRedshiftRDBA/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267785754,"owners_count":24144121,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-29T02:00:12.549Z","response_time":2574,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-13T07:05:27.036Z","updated_at":"2025-07-30T00:32:25.803Z","avatar_url":"https://github.com/guyernest.png","language":"R","funding_links":[],"categories":["R"],"sub_categories":[],"readme":"RedshiftRDBA\n============\n\nDBA library for Amazon Redshift\n\nAmazon Redshift is a powerful data warehouse in the cloud. It can process tons of data with ease, but it is important to understand what is going on under the hood in terms of table data design and query design.\n\nThis library is designed to give DBA or Redshift a quick way to run various queries to the various system tables in Redshift and visualise its results.\n\nThe library is using R, thanks to its open source nature, speed and data visualisation integration.\n\nThe library is divided into several parts:\n\n* General table view - which tables are defined, what is their relative size in terms of Bytes and rows and how much they are sorted\n* Table view - which columns are defined in a specific table, what is their relative size and their encoding\n* General Queries view - how many queries are running on the DB and their relative distribution into fast and slow queries\n\n\nGetting Started\n===============\n\n1. Install R (and RStudio)\n--------------------------\n\nThere are many tutorial on how to install R and its popular IDE Rstudio.\n\nThe easiest way is to launch an EC2 instance with everything already preinstalled:\n\n* Using a web browser: \n\nhttp://www.louisaslett.com/RStudio_AMI/\n\n* Using AWS CLI (http://aws.amazon.com/cli/):\n\n  `aws ec2 run-instances --image-id ami-1ffd6d2f`\n    \n    `--instance-type m3.xlarge` \n    \n    `--security-groups RStudioServer`\n    \n    `--region us-west-2`\n\n(Please remember to modify the user and password of the remote RStudio)\n\n2. Grant permission to access your Redshift cluster to your local or remote R \n--------------------------\n\n* Using a web browser: \n\nhttp://docs.aws.amazon.com/redshift/latest/mgmt/working-with-security-groups.html\n\n* Using AWS CLI:\n\n  `aws redshift authorize-cluster-security-group-ingress --cluster-security-group-name default --ec2-security-group-name RStudioServer`\n\n3. Install the RedshiftRDBA library\n-------------------------\n\nUntil it is nicely packaged, you need to copy the files to your machine and source them\n\n`source('~/RedshiftRDBA/Summary.R')`\n\nTo make sure that you are loading the needed packages, connect to your Redshfit cluster after this first file definition\n\n`con \u003c- clusterConnect(\"cluster-name.xxxxxxxxxx.us-west-2.redshift.amazonaws.com\", \"dev\", \"user\", \"password\")`\n\nThen you can continue with sourcing the rest of the files as needed\n\n`source('~/RedshiftRDBA/TableDesign.R')`\n\n`...`\n\nUsing the DBA library\n====================\n\n### Connecting to your Redshift cluster\n\n`con \u003c- clusterConnect(\"cluster-name.xxxxxxxxxx.us-west-2.redshift.amazonaws.com\", \"dev\", \"user\", \"password\")`\n\nPlease note that the library is accessing the system tables and views, therefore it requires superuser priviliges for the user connecting (superuser explanation: http://docs.aws.amazon.com/redshift/latest/dg/r_superusers.html)\n\n### DB Summary\n\n`DBSummaryTable \u003c- DBSummary(con)`\n\n`plotSummary(DBSummaryTable)`\n\n![Chart Output Example](DBSummaryExample.png \"DB Summary Example\")\n\nPlease note that you might have a very big list of tables with duplicate names in mulitple schema. You can limit the plotting to a specific schema with:\n\n`mySchemaName \u003c- 'my_schema_name'`\n\n`mySchema \u003c- DBSummaryTable[which(DBSummaryTable$nspname==mySchemaName \u0026 DBSummaryTable$schemaname==mySchemaName \u0026 DBSummaryTable$btrim==mySchemaName),]`\n\nIf you have too many tables in a single schema and finding it hard to plot all the table together you can subset the list by:\n\n`plotSummary(mySchema[which(substr(mySchema$table_name,1,3)=='log'),])`\n\n### Table Summary\n\n`dbGetQuery(con, 'set search_path to my_schema_name;')`\n\n`cloudfrontTable \u003c- tableCols(con, \"cloudfront\")`\n\n`plotTable(cloudfrontTable)`\n\n![Chart Output Example](TableSummaryExample.png \"Table Summary Example\")\n\n### Query Summary\n\n`plotQueryDistribution(con)`\n\n![Chart Output Example](FastSlowQueryDistributionExample.png \"Query Distribtion by exectution time Summary Example\")\n\n`hourHeatMap(con)`\n\n![Chart Output Example](HourlyQueryDistributionExample.png \"Query Distribtion by Day/Hour Summary Example\")\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fguyernest%2FRedshiftRDBA","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fguyernest%2FRedshiftRDBA","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fguyernest%2FRedshiftRDBA/lists"}