{"id":25740154,"url":"https://github.com/datadotworld/tap-redshift","last_synced_at":"2025-05-08T21:14:41.090Z","repository":{"id":56777289,"uuid":"111031637","full_name":"datadotworld/tap-redshift","owner":"datadotworld","description":"Singer tap for Redshift","archived":false,"fork":false,"pushed_at":"2023-12-05T05:56:13.000Z","size":40,"stargazers_count":10,"open_issues_count":12,"forks_count":32,"subscribers_count":33,"default_branch":"master","last_synced_at":"2025-05-08T21:14:32.229Z","etag":null,"topics":["dwstruct-t50-public-projects"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/datadotworld.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-11-16T22:57:28.000Z","updated_at":"2025-04-01T21:35:35.000Z","dependencies_parsed_at":"2023-12-05T06:45:31.547Z","dependency_job_id":null,"html_url":"https://github.com/datadotworld/tap-redshift","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datadotworld%2Ftap-redshift","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datadotworld%2Ftap-redshift/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datadotworld%2Ftap-redshift/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datadotworld%2Ftap-redshift/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/datadotworld","download_url":"https://codeload.github.com/datadotworld/tap-redshift/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253149618,"owners_count":21861740,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dwstruct-t50-public-projects"],"created_at":"2025-02-26T08:36:47.264Z","updated_at":"2025-05-08T21:14:41.067Z","avatar_url":"https://github.com/datadotworld.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"============\ntap-redshift\n============\n\n\n`Singer \u003chttps://singer.io\u003e`_ tap that extracts data from a `Redshift \u003chttps://aws.amazon.com/documentation/redshift/\u003e`_ database and produces JSON-formatted data following the Singer spec.\n\n\nUsage\n=====\ntap-redshift assumes you have a connection to Redshift and requires Python 3.6+.\n\nStep 1: Create a configuration file\n-----------------------------------\nWhen you install tap-redshift, you need to create a ``config.json`` file for the database connection.\n\nThe json file requires the following attributes;\n\n* ``host``\n* ``port``\n* ``dbname``\n* ``user``\n* ``password``\n* ``start_date`` (Notation: yyyy-mm-ddThh:mm:ssZ)\n\nAnd an optional attribute;\n\n* ``schema``\n\nExample:\n\n.. code-block:: json\n\n    {\n        \"host\": \"REDSHIFT_HOST\",\n        \"port\": \"REDSHIFT_PORT\",\n        \"dbname\": \"REDSHIFT_DBNAME\",\n        \"user\": \"REDSHIFT_USER\",\n        \"password\": \"REDSHIFT_PASSWORD\",\n        \"start_date\": \"REDSHIFT_START_DATE\",\n        \"schema\": \"REDSHIFT_SCHEMA\"\n    }\n\n\nStep 2: Discover what can be extracted from Redshift\n----------------------------------------------------\nThe tap can be invoked in discovery mode to get the available tables and columns in the database.\nIt points to the config file created to connect to redshift:\n\n.. code-block:: shell\n\n    $ tap-redshift --config config.json -d\n\nA full catalog tap is written to stdout, with a JSON-schema description of each table. A source\ntable directly corresponds to a Singer stream.\n\nRedirect output from the tap's discovery mode to a file so that it can be modified when the tap is\nto be invoked in sync mode.\n\n.. code-block:: shell\n\n    $ tap-redshift -c config.json -d \u003e catalog.json\n\nThis runs the tap in discovery mode and copies the output into a ``catalog.json`` file.\n\nA catalog contains a list of stream objects, one for each table available in your Redshift schema.\n\nExample:\n\n.. code-block:: json\n\n    {\n        \"streams\": [\n            {\n                \"tap_stream_id\": \"sample-dbname.public.sample-name\",\n                \"stream\": \"sample-stream\",\n                \"database_name\": \"sample-dbname\",\n                \"table_name\": \"public.sample-name\"\n                \"schema\": {\n                    \"properties\": {\n                        \"id\": {\n                            \"minimum\": -2147483648,\n                            \"inclusion\": \"automatic\",\n                            \"maximum\": 2147483647,\n                            \"type\": [\n                                \"null\",\n                                \"integer\"\n                            ]\n                        },\n                        \"name\": {\n                            \"maxLength\": 255,\n                            \"inclusion\": \"available\",\n                            \"type\": [\n                                \"null\",\n                                \"string\"\n                            ]\n                        },\n                        \"updated_at\": {\n                            \"inclusion\": \"available\",\n                            \"type\": [\n                                \"string\"\n                            ],\n                            \"format\": \"date-time\"\n                        },\n                    },\n                    \"type\": \"object\"\n                },\n                \"metadata\": [\n                    {\n                        \"metadata\": {\n                            \"selected-by-default\": false,\n                            \"selected\": true,\n                            \"is-view\": false,\n                            \"table-key-properties\": [\"id\"],\n                            \"schema-name\": \"sample-stream\",\n                            \"valid-replication-keys\": [\n                                \"updated_at\"\n                            ]\n                        },\n                        \"breadcrumb\": [],\n                    },\n                    {\n                        \"metadata\": {\n                            \"selected-by-default\": true,\n                            \"sql-datatype\": \"int2\",\n                            \"inclusion\": \"automatic\"\n                        },\n                        \"breadcrumb\": [\n                            \"properties\",\n                            \"id\"\n                        ]\n                    },\n                    {\n                        \"metadata\": {\n                            \"selected-by-default\": true,\n                            \"sql-datatype\": \"varchar\",\n                            \"inclusion\": \"available\"\n                        },\n                        \"breadcrumb\": [\n                            \"properties\",\n                            \"name\"\n                        ]\n                    },\n                    {\n                        \"metadata\": {\n                            \"selected-by-default\": true,\n                            \"sql-datatype\": \"datetime\",\n                            \"inclusion\": \"available\",\n                        },\n                        \"breadcrumb\": [\n                            \"properties\",\n                            \"updated_at\"\n                        ]\n                    }\n                ]\n            }\n        ]\n    }\n\n\nStep 3: Select the tables you want to sync\n------------------------------------------\nIn sync mode, ``tap-redshift`` requires a catalog file to be supplied, where the user must\nhave selected which streams (tables) should be transferred. Streams are not selected by default.\n\nFor each stream in the catalog, find the ``metadata`` section. That is the section you will modify\nto select the stream and, optionally, individual properties too.\n\nThe stream itself is represented by an empty breadcrumb.\n\nExample:\n\n.. code-block:: json\n\n    \"metadata\": [\n        {\n            \"breadcrumb\": [],\n            \"metadata\": {\n                \"selected-by-default\": false,\n                ...\n            }\n        }\n    ]\n\nYou can select it by adding ``\"selected\": true`` to its metadata.\n\nExample:\n\n.. code-block:: json\n\n    \"metadata\": [\n        {\n            \"breadcrumb\": [],\n            \"metadata\": {\n                \"selected\": true,\n                \"selected-by-default\": false,\n                ...\n            }\n        }\n    ]\n\nThe tap can then be invoked in sync mode with the properties catalog argument:\n\nExample (paired with ``target-datadotworld``)\n\n.. code-block:: shell\n\n    tap-redshift -c config.json --catalog catalog.json | target-datadotworld -c config-dw.json\n\n\nStep 4: Sync your data\n----------------------\nThere are two ways to replicate a given table. FULL_TABLE and INCREMENTAL.\nFULL_TABLE replication is used by default.\n\nFull Table\n++++++++++\nFull-table replication extracts all data from the source table each time the tap is invoked without\na state file.\n\nIncremental\n+++++++++++\nIncremental replication works in conjunction with a state file to only extract new records each\ntime the tap is invoked i.e continue from the last synced data.\n\nTo use incremental replication, we need to add the ``replication_method`` and ``replication_key``\nto the streams (tables) metadata in the ``catalog.json`` file.\n\nExample:\n\n.. code-block:: json\n\n    \"metadata\": [\n        {\n            \"breadcrumb\": [],\n            \"metadata\": {\n                \"selected\": true,\n                \"selected-by-default\": false,\n                \"replication-method\": \"INCREMENTAL\",\n                \"replication-key\": \"updated_at\",\n                ...\n            }\n        }\n    ]\n\nWe can then invoke the tap again in sync mode. This time the output will have ``STATE`` messages\nthat contains a ``replication_key_value`` and ``bookmark`` for data that were extracted.\n\nRedirect the output to a ``state.json`` file. Normally, the target will echo the last STATE after\nit has finished processing data.\n\nRun the code below to pass the state into a ``state.json`` file.\n\nExample:\n\n.. code-block:: shell\n\n    tap-redshift -c config.json --catalog catalog.json | \\\n        target-datadotworld -c config-dw.json \u003e state.json\n\nThe ``state.json`` file should look like;\n\n.. code-block:: json\n\n    {\n        \"currently_syncing\": null,\n        \"bookmarks\": {\n            \"sample-dbname.public.sample-name\": {\n                \"replication_key\": \"updated_at\",\n                \"version\": 1516304171710,\n                \"replication_key_value\": \"2013-10-29T09:38:41.341Z\"\n            }\n        }\n    }\n\nFor subsequent runs, you can then invoke the incremental replication passing the latest state in order to limit data only to what has been modified since the last execution.\n\n.. code-block:: shell\n\n    tail -1 state.json \u003e latest-state.json; \\\n    tap-redshift \\\n        -c config-redshift.json \\\n        --catalog catalog.json \\\n\t    -s latest-state.json | \\\n\t        target-datadotworld -c config-dw.json \u003e state.json\n\n\nAll steps in one Makefile\n=========================\n\nFor your convenience, all the steps mentioned above are captured in the ``Makefile`` below.\nThis example uses ``target-datadotworld`` but can be modified to use any other Singer target.\n\n.. code-block:: Makefile\n\n    # Requires python 3.6\n    install:\n        pip3 install tap-redshift; \\\n        pip3 install target-datadotworld\n\n    # Catalog discovery\n    discover:\n        tap-redshift \\\n            -c config-redshift.json -d \u003e catalog.json\n\n    # Full sync\n    fullsync:\n        tap-redshift \\\n            -c config-redshift.json \\\n            --catalog catalog.json | \\\n                target-datadotworld -c config-dw.json \u003e state.json\n\n    # Incremental sync\n    sync:\n        tail -1 state.json \u003e latest-state.json; \\\n        tap-redshift \\\n          -c config-redshift.json \\\n          --catalog catalog.json \\\n          -s latest-state.json | \\\n            target-datadotworld -c config-dw.json \u003e state.json\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatadotworld%2Ftap-redshift","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatadotworld%2Ftap-redshift","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatadotworld%2Ftap-redshift/lists"}