{"id":20826762,"url":"https://github.com/cedrickchee/citus-cluster","last_synced_at":"2025-07-28T19:08:37.494Z","repository":{"id":138118119,"uuid":"415613698","full_name":"cedrickchee/citus-cluster","owner":"cedrickchee","description":"Shard Postgres on a single Citus node and scale-out to a distributed database cluster with multiple worker nodes","archived":false,"fork":false,"pushed_at":"2021-10-11T13:08:51.000Z","size":13,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-06-05T07:39:45.663Z","etag":null,"topics":["citus-extension","database-cluster","distributed-database","educational-project","high-availability","postgresql","sharding"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cedrickchee.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-10-10T14:45:52.000Z","updated_at":"2025-02-21T15:54:17.000Z","dependencies_parsed_at":"2024-03-25T16:05:39.705Z","dependency_job_id":null,"html_url":"https://github.com/cedrickchee/citus-cluster","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/cedrickchee/citus-cluster","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cedrickchee%2Fcitus-cluster","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cedrickchee%2Fcitus-cluster/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cedrickchee%2Fcitus-cluster/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cedrickchee%2Fcitus-cluster/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cedrickchee","download_url":"https://codeload.github.com/cedrickchee/citus-cluster/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cedrickchee%2Fcitus-cluster/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267569686,"owners_count":24109121,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-28T02:00:09.689Z","response_time":68,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["citus-extension","database-cluster","distributed-database","educational-project","high-availability","postgresql","sharding"],"created_at":"2024-11-17T23:09:57.566Z","updated_at":"2025-07-28T19:08:37.451Z","avatar_url":"https://github.com/cedrickchee.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Citus\n\nCitus is a PostgreSQL-based distributed RDBMS. For more information, see the [Citus Data website][citus data].\n\n## Function\n\nThis image provides a single running Citus instance (atop PostgreSQL 14), using standard configuration values. It is based on [the official PostgreSQL image][docker-postgres], so be sure to consult that image’s documentation for advanced configuration options (including non-default settings for e.g. `PGDATA` or `POSTGRES_USER`).\n\nJust like the standard PostgreSQL image, this image exposes port `5432`. In other words, all containers on the same Docker network should be able to connect on this port, and exposing it externally will permit connections from external clients (`psql`, adapters, applications).\n\n## Usage\n\nSince Citus is intended for use within a cluster, there are many ways to deploy it. This repository provides configuration to permit two kinds of deployment: local (standalone) or local (with workers).\n\n### Standalone Use\n\nIf you just want to run a single Citus instance, it’s pretty easy to get started:\n\n```bash\ndocker run -d --name cit_standalone -p 5500:5432 -e POSTGRES_PASSWORD=mypass citusdata/citus\n\n# stop and remove ALL running container, assuming that we only have citus containers\ndocker rm -f $(docker ps -a -q)\n```\n\nYou should now be able to connect to `127.0.0.1` on port `5500` using e.g. `psql` to run a few commands (see the Citus documentation for more information).\n\nAs with the PostgreSQL image, the default `PGDATA` directory will be mounted as a volume, so it will persist between restarts of the container. But while the above _will_ get you a running Citus instance, it won’t have any workers to exercise distributed query planning. For that, you may wish to try the included [`docker-compose.yml`][compose-config] configuration.\n\n### Docker Compose\n\nThe included `docker-compose.yml` file provides an easy way to get started with a Citus cluster, complete with multiple workers. Just copy it to your current directory and run:\n\n```bash\ndocker-compose -p cit up\nCreating network \"cit_default\" with the default driver\nCreating volume \"cit_healthcheck-volume\" with default driver\nPulling master (citusdata/citus:10.2.1-pg14)...\n10.2.1-pg14: Pulling from citusdata/citus\nf8416d8bac72: Pull complete\n...\nDigest: sha256:f741b57b7df6d08a3a441dc140e9eb5a72c83da8934aeb2d7a7b75065a807378\nStatus: Downloaded newer image for citusdata/citus:10.2.1-pg14\nPulling manager (citusdata/membership-manager:0.3.0)...\n0.3.0: Pulling from citusdata/membership-manager\ncbdbe7a5bc2a: Pull complete\n...\nDigest: sha256:cb96b6918d93182a5213e9d07c5f5afa748cdf3b2fcfe644b593bf8ffd14ef1b\nStatus: Downloaded newer image for citusdata/membership-manager:0.3.0\nCreating citus_master ... done\nCreating citus_manager ... done\nCreating cit_worker_1  ... done\nAttaching to citus_master, citus_manager, cit_worker_1\ncitus_manager | Could not connect to master, trying again in 1 second\nworker_1   | Manager is not ready - sleeping\ncitus_master | ********************************************************************************\ncitus_master | WARNING: POSTGRES_HOST_AUTH_METHOD has been set to \"trust\". This will allow\ncitus_master |          anyone with access to the Postgres port to access your database without\ncitus_master |          a password, even if POSTGRES_PASSWORD is set. See PostgreSQL\ncitus_master |          documentation about \"trust\":\ncitus_master |          https://www.postgresql.org/docs/current/auth-trust.html\ncitus_master |          In Docker's default configuration, this is effectively any other\ncitus_master |          container on the same system.\ncitus_master | \ncitus_master |          It is not recommended to use POSTGRES_HOST_AUTH_METHOD=trust. Replace\ncitus_master |          it with \"-e POSTGRES_PASSWORD=password\" instead to set a password in\ncitus_master |          \"docker run\".\ncitus_master | ********************************************************************************\ncitus_master | The files belonging to this database system will be owned by user \"postgres\".\ncitus_master | This user must also own the server process.\ncitus_master | \ncitus_master | The database cluster will be initialized with locale \"en_US.utf8\".\ncitus_master | The default database encoding has accordingly been set to \"UTF8\".\ncitus_master | The default text search configuration will be set to \"english\".\ncitus_master | \ncitus_master | Data page checksums are disabled.\ncitus_master | \ncitus_master | fixing permissions on existing directory /var/lib/postgresql/data ... ok\ncitus_master | creating subdirectories ... ok\ncitus_master | selecting dynamic shared memory implementation ... posix\ncitus_master | selecting default max_connections ... 100\ncitus_master | selecting default shared_buffers ... 128MB\ncitus_master | selecting default time zone ... Etc/UTC\ncitus_master | creating configuration files ... ok\ncitus_master | running bootstrap script ... ok\ncitus_master | performing post-bootstrap initialization ... ok\ncitus_master | syncing data to disk ... initdb: warning: enabling \"trust\" authentication for local connections\ncitus_master | You can change this by editing pg_hba.conf or using the option -A, or\ncitus_master | --auth-local and --auth-host, the next time you run initdb.\ncitus_master | ok\ncitus_master | \ncitus_master | \ncitus_master | Success. You can now start the database server using:\ncitus_master | \ncitus_master |     pg_ctl -D /var/lib/postgresql/data -l logfile start\ncitus_master | \ncitus_master | waiting for server to start....2021-10-10 07:32:00.214 UTC [47] LOG:  number of prepared transactions has not been configured, overriding\ncitus_master | 2021-10-10 07:32:00.214 UTC [47] DETAIL:  max_prepared_transactions is now set to 200\ncitus_master | 2021-10-10 07:32:00.231 UTC [47] LOG:  starting PostgreSQL 14rc1 (Debian 14~rc1-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit\ncitus_master | 2021-10-10 07:32:00.235 UTC [47] LOG:  listening on Unix socket \"/var/run/postgresql/.s.PGSQL.5432\"\ncitus_master | 2021-10-10 07:32:00.243 UTC [48] LOG:  database system was shut down at 2021-10-10 07:32:00 UTC\ncitus_master | 2021-10-10 07:32:00.250 UTC [47] LOG:  database system is ready to accept connections\ncitus_master |  done\ncitus_master | server started\ncitus_master | \ncitus_master | /usr/local/bin/docker-entrypoint.sh: running /docker-entrypoint-initdb.d/001-create-citus-extension.sql\ncitus_master | BEGIN\ncitus_master | 2021-10-10 07:32:00.402 UTC [73] LOG:  citus extension created on postgres without ssl enabled, turning it on during creation of the extension\ncitus_master | 2021-10-10 07:32:00.402 UTC [73] CONTEXT:  SQL statement \"SELECT citus_setup_ssl()\"\ncitus_master |  PL/pgSQL function inline_code_block line 5 at PERFORM\ncitus_master | 2021-10-10 07:32:00.402 UTC [73] STATEMENT:  CREATE EXTENSION citus;\ncitus_master | 2021-10-10 07:32:00.414 UTC [73] LOG:  no certificate present, generating self signed certificate\ncitus_master | 2021-10-10 07:32:00.414 UTC [73] CONTEXT:  SQL statement \"SELECT citus_setup_ssl()\"\ncitus_master |  PL/pgSQL function inline_code_block line 5 at PERFORM\ncitus_master | 2021-10-10 07:32:00.414 UTC [73] STATEMENT:  CREATE EXTENSION citus;\ncitus_master | 2021-10-10 07:32:00.472 UTC [47] LOG:  received SIGHUP, reloading configuration files\ncitus_master | 2021-10-10 07:32:00.473 UTC [47] LOG:  parameter \"listen_addresses\" cannot be changed without restarting the server\ncitus_master | 2021-10-10 07:32:00.473 UTC [47] LOG:  parameter \"ssl\" changed to \"on\"\ncitus_master | 2021-10-10 07:32:00.473 UTC [47] LOG:  parameter \"ssl_ciphers\" changed to \"ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-AES256-SHA384\"\ncitus_master | 2021-10-10 07:32:00.473 UTC [47] LOG:  configuration file \"/var/lib/postgresql/data/postgresql.conf\" contains errors; unaffected changes were applied\ncitus_master | CREATE EXTENSION\ncitus_master | 2021-10-10 07:32:00.898 UTC [74] LOG:  starting maintenance daemon on database 13757 user 10\ncitus_master | 2021-10-10 07:32:00.898 UTC [74] CONTEXT:  Citus maintenance daemon for database 13757 user 10\ncitus_master | UPDATE 1\ncitus_master | COMMIT\ncitus_master | \ncitus_master | \ncitus_master | 2021-10-10 07:32:00.916 UTC [47] LOG:  received fast shutdown request\ncitus_master | waiting for server to shut down....2021-10-10 07:32:00.922 UTC [47] LOG:  aborting any active transactions\ncitus_master | 2021-10-10 07:32:00.925 UTC [47] LOG:  background worker \"logical replication launcher\" (PID 54) exited with exit code 1\ncitus_master | 2021-10-10 07:32:00.932 UTC [49] LOG:  shutting down\ncitus_master | 2021-10-10 07:32:00.965 UTC [47] LOG:  database system is shut down\ncitus_master |  done\ncitus_master | server stopped\ncitus_master | \ncitus_master | PostgreSQL init process complete; ready for start up.\ncitus_master | \ncitus_master | 2021-10-10 07:32:01.050 UTC [1] LOG:  number of prepared transactions has not been configured, overriding\ncitus_master | 2021-10-10 07:32:01.050 UTC [1] DETAIL:  max_prepared_transactions is now set to 200\ncitus_master | 2021-10-10 07:32:01.067 UTC [1] LOG:  starting PostgreSQL 14rc1 (Debian 14~rc1-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit\ncitus_master | 2021-10-10 07:32:01.068 UTC [1] LOG:  listening on IPv4 address \"0.0.0.0\", port 5432\ncitus_master | 2021-10-10 07:32:01.068 UTC [1] LOG:  listening on IPv6 address \"::\", port 5432\ncitus_master | 2021-10-10 07:32:01.072 UTC [1] LOG:  listening on Unix socket \"/var/run/postgresql/.s.PGSQL.5432\"\ncitus_master | 2021-10-10 07:32:01.078 UTC [76] LOG:  database system was shut down at 2021-10-10 07:32:00 UTC\ncitus_master | 2021-10-10 07:32:01.083 UTC [1] LOG:  database system is ready to accept connections\ncitus_manager | connected to master\ncitus_manager | found compose project: cit\ncitus_manager | listening for events...\nworker_1   | Manager is up - starting worker\nworker_1   | ********************************************************************************\nworker_1   | WARNING: POSTGRES_HOST_AUTH_METHOD has been set to \"trust\". This will allow\nworker_1   |          anyone with access to the Postgres port to access your database without\nworker_1   |          a password, even if POSTGRES_PASSWORD is set. See PostgreSQL\nworker_1   |          documentation about \"trust\":\nworker_1   |          https://www.postgresql.org/docs/current/auth-trust.html\nworker_1   |          In Docker's default configuration, this is effectively any other\nworker_1   |          container on the same system.\nworker_1   | \nworker_1   |          It is not recommended to use POSTGRES_HOST_AUTH_METHOD=trust. Replace\nworker_1   |          it with \"-e POSTGRES_PASSWORD=password\" instead to set a password in\nworker_1   |          \"docker run\".\nworker_1   | ********************************************************************************\nworker_1   | The files belonging to this database system will be owned by user \"postgres\".\nworker_1   | This user must also own the server process.\nworker_1   | \nworker_1   | The database cluster will be initialized with locale \"en_US.utf8\".\nworker_1   | The default database encoding has accordingly been set to \"UTF8\".\nworker_1   | The default text search configuration will be set to \"english\".\nworker_1   | \nworker_1   | Data page checksums are disabled.\nworker_1   | \nworker_1   | fixing permissions on existing directory /var/lib/postgresql/data ... ok\nworker_1   | creating subdirectories ... ok\nworker_1   | selecting dynamic shared memory implementation ... posix\nworker_1   | selecting default max_connections ... 100\nworker_1   | selecting default shared_buffers ... 128MB\nworker_1   | selecting default time zone ... Etc/UTC\nworker_1   | creating configuration files ... ok\nworker_1   | running bootstrap script ... ok\nworker_1   | performing post-bootstrap initialization ... ok\nworker_1   | syncing data to disk ... ok\nworker_1   | \nworker_1   | initdb: warning: enabling \"trust\" authentication for local connections\nworker_1   | \nworker_1   | Success. You can now start the database server using:\nworker_1   | \nworker_1   |     pg_ctl -D /var/lib/postgresql/data -l logfile start\nworker_1   | \nworker_1   | You can change this by editing pg_hba.conf or using the option -A, or\nworker_1   | --auth-local and --auth-host, the next time you run initdb.\nworker_1   | waiting for server to start....2021-10-10 07:32:02.948 UTC [39] LOG:  number of prepared transactions has not been configured, overriding\nworker_1   | 2021-10-10 07:32:02.948 UTC [39] DETAIL:  max_prepared_transactions is now set to 200\nworker_1   | 2021-10-10 07:32:02.960 UTC [39] LOG:  starting PostgreSQL 14rc1 (Debian 14~rc1-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit\nworker_1   | 2021-10-10 07:32:02.964 UTC [39] LOG:  listening on Unix socket \"/var/run/postgresql/.s.PGSQL.5432\"\nworker_1   | 2021-10-10 07:32:02.971 UTC [40] LOG:  database system was shut down at 2021-10-10 07:32:02 UTC\nworker_1   | 2021-10-10 07:32:02.979 UTC [39] LOG:  database system is ready to accept connections\nworker_1   |  done\nworker_1   | server started\nworker_1   | \nworker_1   | /usr/local/bin/docker-entrypoint.sh: running /docker-entrypoint-initdb.d/001-create-citus-extension.sql\nworker_1   | BEGIN\nworker_1   | 2021-10-10 07:32:03.141 UTC [65] LOG:  citus extension created on postgres without ssl enabled, turning it on during creation of the extension\nworker_1   | 2021-10-10 07:32:03.141 UTC [65] CONTEXT:  SQL statement \"SELECT citus_setup_ssl()\"\nworker_1   |    PL/pgSQL function inline_code_block line 5 at PERFORM\nworker_1   | 2021-10-10 07:32:03.141 UTC [65] STATEMENT:  CREATE EXTENSION citus;\nworker_1   | 2021-10-10 07:32:03.151 UTC [65] LOG:  no certificate present, generating self signed certificate\nworker_1   | 2021-10-10 07:32:03.151 UTC [65] CONTEXT:  SQL statement \"SELECT citus_setup_ssl()\"\nworker_1   |    PL/pgSQL function inline_code_block line 5 at PERFORM\nworker_1   | 2021-10-10 07:32:03.151 UTC [65] STATEMENT:  CREATE EXTENSION citus;\nworker_1   | 2021-10-10 07:32:03.177 UTC [39] LOG:  received SIGHUP, reloading configuration files\nworker_1   | 2021-10-10 07:32:03.178 UTC [39] LOG:  parameter \"listen_addresses\" cannot be changed without restarting the server\nworker_1   | 2021-10-10 07:32:03.178 UTC [39] LOG:  parameter \"ssl\" changed to \"on\"\nworker_1   | 2021-10-10 07:32:03.178 UTC [39] LOG:  parameter \"ssl_ciphers\" changed to \"ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-AES256-SHA384\"\nworker_1   | 2021-10-10 07:32:03.178 UTC [39] LOG:  configuration file \"/var/lib/postgresql/data/postgresql.conf\" contains errors; unaffected changes were applied\nworker_1   | CREATE EXTENSION\nworker_1   | 2021-10-10 07:32:03.386 UTC [66] LOG:  starting maintenance daemon on database 13757 user 10\nworker_1   | 2021-10-10 07:32:03.386 UTC [66] CONTEXT:  Citus maintenance daemon for database 13757 user 10\nworker_1   | UPDATE 1\nworker_1   | COMMIT\nworker_1   | \nworker_1   | \nworker_1   | 2021-10-10 07:32:03.397 UTC [39] LOG:  received fast shutdown request\nworker_1   | waiting for server to shut down....2021-10-10 07:32:03.400 UTC [39] LOG:  aborting any active transactions\nworker_1   | 2021-10-10 07:32:03.401 UTC [39] LOG:  background worker \"logical replication launcher\" (PID 46) exited with exit code 1\nworker_1   | 2021-10-10 07:32:03.404 UTC [41] LOG:  shutting down\nworker_1   | 2021-10-10 07:32:03.441 UTC [39] LOG:  database system is shut down\nworker_1   |  done\nworker_1   | server stopped\nworker_1   | \nworker_1   | PostgreSQL init process complete; ready for start up.\nworker_1   | \nworker_1   | 2021-10-10 07:32:03.524 UTC [1] LOG:  number of prepared transactions has not been configured, overriding\nworker_1   | 2021-10-10 07:32:03.524 UTC [1] DETAIL:  max_prepared_transactions is now set to 200\nworker_1   | 2021-10-10 07:32:03.538 UTC [1] LOG:  starting PostgreSQL 14rc1 (Debian 14~rc1-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit\nworker_1   | 2021-10-10 07:32:03.540 UTC [1] LOG:  listening on IPv4 address \"0.0.0.0\", port 5432\nworker_1   | 2021-10-10 07:32:03.540 UTC [1] LOG:  listening on IPv6 address \"::\", port 5432\nworker_1   | 2021-10-10 07:32:03.545 UTC [1] LOG:  listening on Unix socket \"/var/run/postgresql/.s.PGSQL.5432\"\nworker_1   | 2021-10-10 07:32:03.552 UTC [68] LOG:  database system was shut down at 2021-10-10 07:32:03 UTC\nworker_1   | 2021-10-10 07:32:03.558 UTC [1] LOG:  database system is ready to accept connections\ncitus_manager | adding cit_worker_1\ncitus_master | 2021-10-10 07:32:05.347 UTC [96] LOG:  starting maintenance daemon on database 13757 user 10\ncitus_master | 2021-10-10 07:32:05.347 UTC [96] CONTEXT:  Citus maintenance daemon for database 13757 user 10\nworker_1   | 2021-10-10 07:32:05.361 UTC [87] LOG:  starting maintenance daemon on database 13757 user 10\nworker_1   | 2021-10-10 07:32:05.361 UTC [87] CONTEXT:  Citus maintenance daemon for database 13757 user 10\n```\n\nThat’s it! As with the standalone mode, you’ll want to find your `docker-machine ip` if you’re using that technology, otherwise, just connect locally to `5432`.\n\n```bash\n# connect using psql within the Docker container\ndocker exec -it citus_master psql -U postgres\n```\n\n\nBy default, you’ll only have one worker:\n\n```sql\nSELECT master_get_active_worker_nodes();\n\n--  master_get_active_worker_nodes\n-- --------------------------------\n--  (cit_worker_1,5432)\n-- (1 row)\n```\n\nBut you can add more workers at will using `docker-compose scale` in another tab. For instance, to bring your worker count to five…\n\n```bash\ndocker-compose -p cit scale worker=5\n\n# Creating and starting 2 ... done\n# Creating and starting 3 ... done\n# Creating and starting 4 ... done\n# Creating and starting 5 ... done\n```\n\n```sql\nSELECT master_get_active_worker_nodes();\n\n--  master_get_active_worker_nodes\n-- --------------------------------\n--  (cit_worker_5,5432)\n--  (cit_worker_1,5432)\n--  (cit_worker_3,5432)\n--  (cit_worker_2,5432)\n--  (cit_worker_4,5432)\n-- (5 rows)\n```\n\nThe `pg_dist_node` table contains information about the worker nodes in the cluster.\n\n```sql\nSELECT * from pg_dist_node;\n nodeid | groupid | nodename     | nodeport | noderack | hasmetadata | isactive | noderole | nodecluster | metadatasynced | shouldhaveshards \n--------+---------+--------------+----------+----------+-------------+----------+----------+-------------+----------------+------------------\n      1 |       0 | cit_worker_1 |     5432 | default  | t           | t        | primary  | default     | t              | t\n    ...       ...\n```\n\nNow that the shards have been distributed, the database can use the resources on\nthe worker node(s) as well. From your application’s perspective, nothing has\nchanged. After adding 4 new nodes to the Citus database cluster, and after\nrebalancing shards across the cluster, your application is still talking to the\nsame Postgres database. You have seamlessly scaled out your Postgres database \nwith Citus!\n\n**DEPRECATED**\nIf you inspect the configuration file, you’ll find that there is a container that is neither a master nor worker node: `citus_config`. It simply listens for new containers tagged with the worker role, then adds them to the config file in a volume shared with the master node. If new nodes have appeared, it calls `master_initialize_node_metadata` against the master to repopulate the node table. See Citus’ [`workerlist-gen`][workerlist-gen] repo for more details.\n\nYou can stop your cluster with `docker-compose -p citus down`.\n\n## Acknowledgement\n\nThis work was based on Citus [docker][citus-docker] project.\n\n## Tutorials\n\nFrom here on, you can choose to continue by trying the [tutorials][tutorials]\nthat will teach you how to use Citus by using sample data.\n\n### Multi-tenant Applications\n\nIn this tutorial, we will use a sample ad analytics dataset to demonstrate how\nyou can use Citus to power your multi-tenant application.\n\n#### Data model and sample data\n\nWe will demo building the database for an ad-analytics app which companies can\nuse to view, change, analyze and manage their ads and campaigns (see an\n[example app][example-app]). Such an application has good characteristics of a\ntypical multi-tenant system. Data from different tenants is stored in a central\ndatabase, and each tenant has an isolated view of their own data.\n\nWe will use three Postgres tables to represent this data. To get started, you\nwill need to download sample data for these tables:\n\n```sh\ncurl https://examples.citusdata.com/tutorial/companies.csv \u003e companies.csv\ncurl https://examples.citusdata.com/tutorial/campaigns.csv \u003e campaigns.csv\ncurl https://examples.citusdata.com/tutorial/ads.csv \u003e ads.csv\n```\n\nIf you are using Docker, you should use the `docker cp` command to copy the\nfiles into the Docker container.\n\n```sh\ndocker cp companies.csv cit_master:.\ndocker cp campaigns.csv cit_master:.\ndocker cp ads.csv cit_master:.\n```\n\n#### Creating tables\n\nTo start, you can first connect to the Citus coordinator using `psql`.\n\nIf you are using native Postgres, as installed in our Single-Node Citus guide,\nthe coordinator node will be running on port 9700.\n\n```sh\n# psql -p 9700\n\n# I'm using Docker Compose configuration to run my Citus cluster.\n# So this command connect to the master container port 5432.\npsql -h 0.0.0.0 -U postgres\n```\n\nAlternatively, if you are using Docker, you can connect by running `psql` with\nthe `docker exec` command:\n\n```sh\ndocker exec -it cit_master psql -U postgres\n```\n\nThen, you can create the tables by using standard PostgreSQL `CREATE TABLE`\ncommands.\n\n```sql\nCREATE TABLE companies (\n    id bigint NOT NULL,\n    name text NOT NULL,\n    image_url text,\n    created_at timestamp without time zone NOT NULL,\n    updated_at timestamp without time zone NOT NULL\n);\n\nCREATE TABLE campaigns (\n    id bigint NOT NULL,\n    company_id bigint NOT NULL,\n    name text NOT NULL,\n    cost_model text NOT NULL,\n    state text NOT NULL,\n    monthly_budget bigint,\n    blacklisted_site_urls text[],\n    created_at timestamp without time zone NOT NULL,\n    updated_at timestamp without time zone NOT NULL\n);\n\nCREATE TABLE ads (\n    id bigint NOT NULL,\n    company_id bigint NOT NULL,\n    campaign_id bigint NOT NULL,\n    name text NOT NULL,\n    image_url text,\n    target_url text,\n    impressions_count bigint DEFAULT 0,\n    clicks_count bigint DEFAULT 0,\n    created_at timestamp without time zone NOT NULL,\n    updated_at timestamp without time zone NOT NULL\n);\n```\n\nNext, you can create primary key indexes on each of the tables just like you\nwould do in PostgreSQL.\n\n```sql\nALTER TABLE companies ADD PRIMARY KEY (id);\nALTER TABLE campaigns ADD PRIMARY KEY (id, company_id);\nALTER TABLE ads ADD PRIMARY KEY (id, company_id);\n```\n\n#### Distributing tables and loading data.\n\nWe will now go ahead and tell Citus to distribute these tables across the\ndifferent nodes we have in the cluster. To do so, you can run\n`create_distributed_table` and specify the table you want to shard and the\ncolumn you want to shard on. In this case, we will shard all the tables on the\n`company_id`.\n\n```sql\nSELECT create_distributed_table('companies', 'id');\nSELECT create_distributed_table('campaigns', 'company_id');\nSELECT create_distributed_table('ads', 'company_id');\n```\n\nSharding all tables on the company identifier allows Citus to\n[colocate][colocate] the tables together and allow for features like primary\nkeys, foreign keys and complex joins across your cluster. You can learn more\nabout the benefits of this approach [here][design-saas-db-for-high-scalibility].\n\nThen, you can go ahead and load the data we downloaded into the tables using the\nstandard PostgreSQL `\\COPY` command. Please make sure that you specify the\ncorrect file path if you downloaded the file to some other location.\n\n```sql\n\\copy companies from '/home/neo/huge_data/citus/companies.csv' with csv\nCOPY 100\n\n\\copy campaigns from '/home/neo/huge_data/citus/campaigns.csv' with csv\nCOPY 978\n\n\\copy ads from '/home/neo/huge_data/citus/ads.csv' with csv\nCOPY 7364\n```\n\n**Shard information view**\n\n```sql\nselect * from citus_shards;\n```\n\n#### Running queries\n\nNow that we have loaded data into the tables, let’s go ahead and run some\nqueries. Citus supports standard `INSERT`, `UPDATE` and `DELETE` commands for\ninserting and modifying rows in a distributed table which is the typical way of\ninteraction for a user-facing application.\n\nFor example, you can insert a new company by running:\n\n```sql\npostgres=# INSERT INTO companies VALUES (5000, 'New Company', 'https://randomurl/image.png', now(), now());\nINSERT 0 1\n```\n\nIf you want to double the budget for all the campaigns of a company, you can run\nan `UPDATE` command:\n\n```sql\npostgres=# UPDATE campaigns\nSET monthly_budget = monthly_budget*2\nWHERE company_id = 5;\nUPDATE 12\n```\n\nAnother example of such an operation would be to run transactions which span\nmultiple tables. Let’s say you want to delete a campaign and all its associated\nads, you could do it atomically by running:\n\n```sql\npostgres=# BEGIN;\nBEGIN\npostgres=*# DELETE FROM campaigns WHERE id = 46 AND company_id = 5;\nDELETE 1\npostgres=*# DELETE FROM ads WHERE campaign_id = 46 AND company_id = 5;\nDELETE 7\npostgres=*# COMMIT;\nCOMMIT\n```\n\nEach statement in a transactions causes roundtrips between the coordinator and\nworkers in multi-node Citus. For multi-tenant workloads, it’s more efficient to\nrun transactions in distributed functions. The efficiency gains become more\napparent for larger transactions, but we can use the small transaction above as\nan example.\n\nFirst create a function that does the deletions:\n\n```sql\nCREATE OR REPLACE FUNCTION\n  delete_campaign(company_id int, campaign_id int)\nRETURNS void LANGUAGE plpgsql AS $fn$\nBEGIN\n  DELETE FROM campaigns\n   WHERE id = $2 AND campaigns.company_id = $1;\n  DELETE FROM ads\n   WHERE ads.campaign_id = $2 AND ads.company_id = $1;\nEND;\n$fn$;\n```\n\nNext use [`create_distributed_function`][create-dist-func] to instruct Citus to\nrun the function directly on workers rather than on the coordinator (except on a\nsingle-node Citus installation, which runs everything on the coordinator). It\nwill run the function on whatever worker holds the [Shards][shards] for tables\n`ads` and `campaigns` corresponding to the value `company_id`.\n\n```sql\nSELECT create_distributed_function(\n  'delete_campaign(int, int)', 'company_id',\n  colocate_with := 'campaigns'\n);\n\n-- you can run the function as usual\nSELECT delete_campaign(5, 46);\n```\n\nBesides transactional operations, you can also run analytics queries using\nstandard SQL. One interesting query for a company to run would be to see details\nabout its campaigns with maximum budget.\n\n```sql\nSELECT name, cost_model, state, monthly_budget\nFROM campaigns\nWHERE company_id = 5\nORDER BY monthly_budget DESC\nLIMIT 10;\n          name           |     cost_model      |  state   | monthly_budget \n-------------------------+---------------------+----------+----------------\n Wondra                  | cost_per_impression | running  |          16732\n Quicksilver             | cost_per_click      | paused   |          12664\n Cyborg                  | cost_per_impression | running  |           8198\n ...\n```\n\nWe can also run a join query across multiple tables to see information about\nrunning campaigns which receive the most clicks and impressions.\n\n```sql\nSELECT campaigns.id, campaigns.name, campaigns.monthly_budget,\n       sum(impressions_count) as total_impressions, sum(clicks_count) as total_clicks\nFROM ads, campaigns\nWHERE ads.company_id = campaigns.company_id\nAND campaigns.company_id = 5\nAND campaigns.state = 'running'\nGROUP BY campaigns.id, campaigns.name, campaigns.monthly_budget\nORDER BY total_impressions, total_clicks;\n```\n\nWith this, we come to the end of our tutorial on using Citus to power a simple\nmulti-tenant application. As a next step, you can look at the Multi-Tenant Apps\nsection to see how you can model your own data for multi-tenancy.\n\n### Real-time Analytics\n\nIn this tutorial, we will demonstrate how you can use Citus to ingest events\ndata and run analytical queries on that data in human real-time. For that, we\nwill use a sample Github events dataset.\n\n#### Data model and sample data\n\nWe will demo building the database for a real-time analytics application. This\napplication will insert large volumes of events data and enable analytical\nqueries on that data with sub-second latencies. In our example, we’re going to\nwork with the Github events dataset. This dataset includes all public events on\nGithub, such as commits, forks, new issues, and comments on these issues.\n\nWe will use two Postgres tables to represent this data. To get started, you will\nneed to download sample data for these tables:\n\n```sh\ncurl https://examples.citusdata.com/tutorial/users.csv \u003e users.csv\ncurl https://examples.citusdata.com/tutorial/events.csv \u003e events.csv\n\nwc -l users.csv\n264308 users.csv\n\nwc -l events.csv\n30000 events.csv\n```\n\nIf you are using Docker, you should use the `docker cp` command to copy the\nfiles into the Docker container.\n\n```sh\ndocker cp users.csv cit_master:.\ndocker cp events.csv cit_master:.\n```\n\n#### Creating tables\n\nTo start, you can first connect to the Citus coordinator using `psql`.\n\nIf you are using native Postgres, as installed in our Single-Node Citus guide,\nthe coordinator node will be running on port 9700.\n\n```sh\npsql -p 9700\n\n# I'm using Docker Compose configuration to run my Citus cluster.\n# So this command connect to the master container port 5432.\npsql -h 0.0.0.0 -U postgres\n```\n\nIf you are using Docker, you can connect by running `psql` with the `docker exec`\ncommand:\n\n```sh\ndocker exec -it cit_master psql -U postgres\n```\n\nThen, you can create the tables by using standard PostgreSQL `CREATE TABLE`\ncommands.\n\n```sql\nCREATE TABLE github_events\n(\n    event_id bigint,\n    event_type text,\n    event_public boolean,\n    repo_id bigint,\n    payload jsonb,\n    repo jsonb,\n    user_id bigint,\n    org jsonb,\n    created_at timestamp\n);\n\nCREATE TABLE github_users\n(\n    user_id bigint,\n    url text,\n    login text,\n    avatar_url text,\n    gravatar_id text,\n    display_login text\n);\n```\n\nNext, you can create indexes on events data just like you would do in\nPostgreSQL. In this example, we’re also going to create a `GIN` index to make\nquerying on `jsonb` fields faster.\n\n```sql\nCREATE INDEX event_type_index ON github_events (event_type);\nCREATE INDEX payload_index ON github_events USING GIN (payload jsonb_path_ops);\n```\n\n#### Distributing tables and loading data\n\nWe will now go ahead and tell Citus to distribute these tables across the nodes\nin the cluster. To do so, you can run `create_distributed_table` and specify the\ntable you want to shard and the column you want to shard on. In this case, we\nwill shard all the tables on `user_id`.\n\n```sql\nSELECT create_distributed_table('github_users', 'user_id');\nSELECT create_distributed_table('github_events', 'user_id');\n```\n\nSharding all tables on the user identifier allows Citus to colocate these tables\ntogether, and allows for efficient joins and distributed roll-ups.\n\nThen, you can go ahead and load the data we downloaded into the tables using the\nstandard PostgreSQL `\\COPY` command. Please make sure that you specify the\ncorrect file path if you downloaded the file to a different location.\n\n```sql\n\\copy github_users from '/home/neo/huge_data/citus/users.csv' with csv\nCOPY 264308\n\n\\copy github_events from '/home/neo/huge_data/citus/events.csv' with csv\nCOPY 30000\n```\n\n#### Running queries\n\nNow that we have loaded data into the tables, let’s go ahead and run some\nqueries. First, let’s check how many users we have in our distributed database.\n\n```sql\nSELECT count(*) FROM github_users;\n count  \n--------\n 264308\n(1 row)\n```\n\nNow, let’s analyze Github push events in our data. We will first compute the\nnumber of commits per minute by using the number of distinct commits in each\npush event.\n\n```sql\nSELECT date_trunc('minute', created_at) AS minute,\n       sum((payload-\u003e\u003e'distinct_size')::int) AS num_commits\nFROM github_events\nWHERE event_type = 'PushEvent'\nGROUP BY minute\nORDER BY minute;\n```\n\nWe also have a users table. We can also easily join the users with events, and\nfind the top ten users who created the most repositories.\n\n```sql\nSELECT login, count(*)\nFROM github_events ge\nJOIN github_users gu\nON ge.user_id = gu.user_id\nWHERE event_type = 'CreateEvent' AND payload @\u003e '{\"ref_type\": \"repository\"}'\nGROUP BY login\nORDER BY count(*) DESC LIMIT 10;\n```\n\n**View query plan:**\n\nFirst, add more workers using `docker-compose scale`. For instance, to bring\nyour worker count to five.\n\n```sh\n$ docker-compose -p cit scale worker=5\n```\n\nThen, rebalance shards.\n\n```sql\n-- move shards to new worker node(s)\nSELECT rebalance_table_shards();\n```\n\n```sql\nEXPLAIN SELECT login, count(*)\nFROM github_events ge\nJOIN github_users gu\nON ge.user_id = gu.user_id\nWHERE event_type = 'CreateEvent' AND payload @\u003e '{\"ref_type\": \"repository\"}'\nGROUP BY login\nORDER BY count(*) DESC LIMIT 10;\n                                                        QUERY PLAN\n-------------------------------------------------------------------------------------------------------------------------------------------------\n Limit  (cost=507.82..507.85 rows=10 width=40)\n   -\u003e  Sort  (cost=507.82..508.32 rows=200 width=40)\n         Sort Key: (COALESCE((pg_catalog.sum(remote_scan.count))::bigint, '0'::bigint)) DESC\n         -\u003e  HashAggregate  (cost=500.00..503.50 rows=200 width=40)\n               Group Key: remote_scan.login\n               -\u003e  Custom Scan (Citus Adaptive)  (cost=0.00..0.00 rows=100000 width=40)\n                     Task Count: 32\n                     Tasks Shown: One of 32\n                     -\u003e  Task\n                           Node: host=cit_worker_3 port=5432 dbname=postgres\n                           -\u003e  GroupAggregate  (cost=385.90..386.06 rows=9 width=18)\n                                 Group Key: gu.login\n                                 -\u003e  Sort  (cost=385.90..385.92 rows=9 width=10)\n                                       Sort Key: gu.login\n                                       -\u003e  Hash Join  (cost=358.72..385.76 rows=9 width=10)\n                                             Hash Cond: (ge.user_id = gu.user_id)\n                                             -\u003e  Bitmap Heap Scan on github_events_102040 ge  (cost=17.74..44.65 rows=9 width=8)\n                                                   Recheck Cond: ((event_type = 'CreateEvent'::text) AND (payload @\u003e '{\"ref_type\": \"repository\"}'::jsonb))\n                                                   -\u003e  BitmapAnd  (cost=17.74..17.74 rows=9 width=0)\n                                                         -\u003e  Bitmap Index Scan on event_type_index_102040  (cost=0.00..5.03 rows=117 width=0)\n                                                               Index Cond: (event_type = 'CreateEvent'::text)\n                                                         -\u003e  Bitmap Index Scan on payload_index_102040  (cost=0.00..12.46 rows=61 width=0)\n                                                               Index Cond: (payload @\u003e '{\"ref_type\": \"repository\"}'::jsonb)\n                                             -\u003e  Hash  (cost=237.10..237.10 rows=8310 width=18)\n                                                   -\u003e  Seq Scan on github_users_102008 gu  (cost=0.00..237.10 rows=8310 width=18)\n```\n\nCitus also supports standard `INSERT`, `UPDATE`, and `DELETE` commands for\ningesting and modifying data. For example, you can update a user’s display login\nby running the following command:\n\n```sql\nUPDATE github_users SET display_login = 'no1youknow' WHERE user_id = 24305673;\n```\n\nWith this, we come to the end of our tutorial. As a next step, you can look at\nthe Real-Time Apps section to see how you can model your own data and power\nreal-time analytical applications.\n\n[image size]: https://microbadger.com/images/citusdata/citus\n[release]: https://github.com/citusdata/docker/releases/latest\n[license]: LICENSE\n[citus data]: https://www.citusdata.com\n[docker-postgres]: https://hub.docker.com/_/postgres/\n[compose-config]: docker-compose.yml\n[workerlist-gen]: https://github.com/citusdata/workerlist-gen\n[tutorials]: https://docs.citusdata.com/en/stable/get_started/tutorials.html\n[example-app]: https://github.com/citusdata/citus-example-ad-analytics/\n[colocate]: https://docs.citusdata.com/en/stable/sharding/data_modeling.html#colocation\n[design-saas-db-for-high-scalibility]: https://www.citusdata.com/blog/2016/10/03/designing-your-saas-database-for-high-scalability/\n[shards]: https://docs.citusdata.com/en/stable/get_started/concepts.html#shards\n[create-dist-func]: https://docs.citusdata.com/en/stable/develop/api_udf.html#create-distributed-function\n[citus-docker]: https://github.com/citusdata/docker\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcedrickchee%2Fcitus-cluster","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcedrickchee%2Fcitus-cluster","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcedrickchee%2Fcitus-cluster/lists"}