{"id":22152121,"url":"https://github.com/findinpath/trino-hive-2-postgres","last_synced_at":"2026-01-30T12:33:54.597Z","repository":{"id":112791078,"uuid":"423251303","full_name":"findinpath/trino-hive-2-postgres","owner":"findinpath","description":"Copy data from Hive to PostgreSQL via Trino","archived":false,"fork":false,"pushed_at":"2021-10-31T20:17:43.000Z","size":39,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-11T09:57:08.206Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/findinpath.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-10-31T20:17:20.000Z","updated_at":"2021-10-31T20:17:46.000Z","dependencies_parsed_at":null,"dependency_job_id":"a362fdc2-eef5-4412-8b06-97e94161c076","html_url":"https://github.com/findinpath/trino-hive-2-postgres","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/findinpath/trino-hive-2-postgres","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/findinpath%2Ftrino-hive-2-postgres","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/findinpath%2Ftrino-hive-2-postgres/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/findinpath%2Ftrino-hive-2-postgres/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/findinpath%2Ftrino-hive-2-postgres/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/findinpath","download_url":"https://codeload.github.com/findinpath/trino-hive-2-postgres/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/findinpath%2Ftrino-hive-2-postgres/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28912911,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-30T12:13:43.263Z","status":"ssl_error","status_checked_at":"2026-01-30T12:13:22.389Z","response_time":66,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-02T00:47:13.335Z","updated_at":"2026-01-30T12:33:54.578Z","avatar_url":"https://github.com/findinpath.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"dbt-trino-hive-2-postgres\n=========================\n\nThis proof of concept project is thought to showcase a rather uncommon use case for\n[Trino](https://trino.io/) SQL query engine of copying data from one database into\nanother database.\n\nTrino is known in the analytics world for its highly parallel and distributed\nquery engine which can pull data from a wide range of disparate data sources\nand perform analysis without the need for complex, slow and error-prone processes\nfor copying the data.\n\n\nThis project does not show anything related on how to use Trino on performing OLAP\nstatements, but rather a more pragmatic ad-hoc use case of copying data from one place to another.\n\nIt is relatively common to have a situation where a table with up to a few millions of rows is found\non one database and we want its contents copied to another database for either development or migration\npurposes.\n\nIn such cases, Trino may come to the rescue before twisting your head on how to copy the data from\none database to another.\n\nThis project showcases the scenario of copying the content of a [hive](https://hive.apache.org/) table\ntowards a [PostgreSQL](https://www.postgresql.org/) table.\n\n![Copy Hive to Postgres via Trino](img/copy-hive-to-postgres-via-trino.png)\n\nObviously, the input and output databases / object stores can be chosen in need by using any\nof the available Trino [connectors](https://trino.io/docs/current/connector.html).\n\n\n## Demo\n\nThe following demo is based on [docker](https://www.docker.com/) and can be easily\nexecuted on any workstation without additional tools.\n\nSpin up Docker environment for working with:\n \n- Trino \n- [minio](https://min.io/) hive compatible  object storage\n- PostgreSQL database\n\n\n```\ndocker-compose up -d\n```\n\n\n### Create a bucket in MinIO\n\n\nOpen [MinIO UI](http://localhost:9000/) by using the following credentials:\n\n- access key: `minio`\n- secret key: `minio123`\n\nCreate the bucket `tiny`\n\n### Trino CLI\n\nCheck out the available catalogs on Trino:\n\n```\ndocker container exec -it trino-hive-2-postgres_trino-coordinator_1 trino\n```\n\n```\ntrino\u003e show catalogs;\n Catalog  \n----------\n minio    \n postgres \n system   \n(3 rows)\n```\n\n\nCreate Trino `minio.tiny` schema\n\n```\nCREATE SCHEMA minio.tiny\nWITH (location = 's3a://tiny/');\n```\n\nCreate `minio.tiny.customer` hive table\n\n\n```sql\nCREATE TABLE minio.tiny.customer (                                                     \n    first_name varchar(32),                        \n    last_name varchar(32),                         \n    email varchar(256),\n    id bigint                             \n )                                                 \n WITH (                                            \n    external_location = 's3a://tiny/customer', \n    format = 'ORC',\n    partitioned_by= ARRAY['id']\n );\n```\n\nPopulate `minio.tiny.customer` hive table\n\n```\nINSERT INTO minio.tiny.customer (id, first_name, last_name, email)\n    VALUES \n        (1,'Michael','Perez','mperez0@chronoengine.com'),\n        (2,'Shawn','Mccoy','smccoy1@reddit.com'),\n        (3,'Kathleen','Payne','kpayne2@cargocollective.com');\n\n```\n\n\nCopy the content of the `minio.tiny.customer` hive table towards Postgres\n\n```\nCREATE TABLE  postgres.public.customer AS\nSELECT id, first_name, last_name, email\nFROM  minio.tiny.customer;\n```\n\n\nBy performing the statement:\n\n```\nSELECT * FROM postgres.public.customer;\n```\n\ncan be verified that the entries copied from hive exist in the newly created Postgres table.\n\n\nThis demo showcases also copying incremental changes `minio.tiny.customer` hive table towards Postgres.\n\nAdd new content to the Hive table:\n\n```\nINSERT INTO minio.tiny.customer (id, first_name, last_name, email)\n    VALUES \n        (4,'Rosa','McDonald','rmcdonald@bit.com');\n```\n\n\nCopy incrementally new content from Hive into Postgres:\n\n```\nINSERT INTO postgres.public.customer\nSELECT id, first_name, last_name, email\nFROM minio.tiny.customer\nWHERE id \u003e (SELECT MAX(id) FROM postgres.public.customer);\n```\n\n\nStop and tear down the docker environment used for the demo purposes:\n\n```\ndocker-compose down\n```\n\n\n## Conclusions\n\n\nAs can be seen from the brief demo shown above, Trino can be also used\nto copy data from one database/object store to another database/object store.\n\nObviously there are other ways to copy content from one database to another,\nbut in case you have familiarity to Trino, this may spare you some hours\nin achieving an ad-hoc copy of the content of a table from an input database\ntowards another table in an output database.\n\nThe Trino session propperty `write_batch_size` available on the JDBC connectors \n(set by default to `1000`) can be used to speed up the insertion of the data to the output database.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffindinpath%2Ftrino-hive-2-postgres","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffindinpath%2Ftrino-hive-2-postgres","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffindinpath%2Ftrino-hive-2-postgres/lists"}