https://github.com/cloudspannerecosystem/spanner-migration-example

Last synced: 5 months ago
JSON representation
Host: GitHub
URL: https://github.com/cloudspannerecosystem/spanner-migration-example
Owner: cloudspannerecosystem
License: apache-2.0
Created: 2024-01-31T01:42:43.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-02-21T00:22:27.000Z (over 2 years ago)
Last Synced: 2024-04-15T12:16:12.117Z (about 2 years ago)
Language: Java
Size: 58.6 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project

README

          # PostgreSQL to Spanner Migration Example

This is the sample application used to exemplify a migration from an open-source PostgreSQL database (Cloud SQL) to a Spanner PostgreSQL dialect database.

It uses the open-source JDBC driver for both databases.

## Database Schema

### Open-source PostgreSQL (Cloud SQL)

```sql

-- When converting to Spanner:

--   * Table does not contain a primary key column, one must be added

--   * SERIAL type does not exist in Spanner

CREATE TABLE singers (

  singer_id   SERIAL UNIQUE,

  first_name  VARCHAR(1024),

  last_name   VARCHAR(1024)

);

-- When converting to Spanner:

--   * singer_id has int4 type, which must be converted to int8 in Spanner

--   * SERIAL type does not exist in Spanner

--   * album_title UNIQUE must be converted to an UNIQUE index in Spanner

--   * FOREIGN KEY relation can be converted to Spanner INTERLEAVED table

CREATE TABLE albums (

  singer_id     int,

  album_id      SERIAL,

  album_title   VARCHAR UNIQUE,

  PRIMARY KEY (singer_id, album_id),

  CONSTRAINT fk_singers

    FOREIGN KEY(singer_id) REFERENCES singers(singer_id)

    ON DELETE CASCADE

);

-- When converting to Spanner:

--   * SERIAL type does not exist in Spanner

--   * song_data json type must be converted to jsonb in Spanner

--   * FOREIGN KEY relation can be converted to Spanner INTERLEAVED table

CREATE TABLE songs (

  singer_id     int,

  album_id      int,

  song_id       SERIAL,

  song_name     VARCHAR,

  song_data     json,

  PRIMARY KEY (singer_id, album_id, song_id),

  CONSTRAINT fk_albums

    FOREIGN KEY(singer_id, album_id) REFERENCES albums(singer_id, album_id)

    ON DELETE CASCADE

);

-- When converting to Spanner:

--   * Duplicated index, already taken care of by songs primary key

CREATE INDEX singer_album_song ON songs(singer_id, album_id);

```

### Spanner PostgreSQL

This is the migrated schema on Spanner.

```sql

CREATE SEQUENCE singer_id_seq BIT_REVERSED_POSITIVE START COUNTER WITH 1;

CREATE SEQUENCE album_id_seq BIT_REVERSED_POSITIVE START COUNTER WITH 1;

CREATE SEQUENCE song_id_seq BIT_REVERSED_POSITIVE START COUNTER WITH 1;

CREATE TABLE singers (

  singer_id bigint DEFAULT nextval('singer_id_seq'::text) NOT NULL,

  first_name character varying(1024),

  last_name character varying(1024),

  PRIMARY KEY(singer_id)

);

CREATE TABLE albums (

  singer_id bigint NOT NULL,

  album_id bigint DEFAULT nextval('album_id_seq'::text) NOT NULL,

  album_title character varying,

  PRIMARY KEY(singer_id, album_id)

) INTERLEAVE IN PARENT singers;

CREATE UNIQUE INDEX albums_album_title_key ON albums (album_title) WHERE (album_title IS NOT NULL);

CREATE TABLE songs (

  singer_id bigint NOT NULL,

  album_id bigint NOT NULL,

  song_id bigint DEFAULT nextval('song_id_seq'::text) NOT NULL,

  song_name character varying,

  song_data jsonb,

  PRIMARY KEY(singer_id, album_id, song_id)

) INTERLEAVE IN PARENT albums;

```

## Application Updates

The original application, which worked only with the open-source PostgreSQL database, had to be updated to work with Spanner PostgreSQL dialect as well.

Below we go over the necessary modifications.

First, we parameterized the application to boot up using either Cloud SQL or Spanner. For this purpose we introduced the [DatabaseChoice](src/main/java/com/google/DatabaseChoice.java) abstraction. In a production scenario we would most likely have used a feature flag here.

We modified our [docker-compose.yml](docker-compose.yml) file to input the database choice on initialization:

```yaml

services:

  app-cloudsql

    ...

    command: [ "java", "-jar", "app.jar", "cloudsql" ]

  app-spanner:

    ...

    command: [ "java", "-jar", "app.jar", "spanner" ]

```

Secondly, we proceeded to set up a Spanner connection. In order to keep using the PostgreSQL drivers in our application (instead of Spanner specific drivers), we had to configure [PGAdapter](https://github.com/GoogleCloudPlatform/pgadapter/tree/postgresql-dialect?tab=readme-ov-file#google-cloud-spanner-pgadapter). PGAdapter serves as a proxy that translates the PostgreSQL wire-protocol into the equivalent for Spanner databases that use the PostgreSQL interface. Our application connects to PGAdapter instead of Spanner. There are [multiple ways to use PGAdapter](https://cloud.google.com/spanner/docs/pgadapter#execution-env). We used the [distroless Docker image](https://github.com/GoogleCloudPlatform/pgadapter/tree/postgresql-dialect?tab=readme-ov-file#distroless-docker-image), as it is independent of the implementation of our application. The changes are reflected in our [docker-compose.yml](docker-compose.yml) file:

```yaml

services:

  ...

  pgadapter:

    image: gcr.io/cloud-spanner-pg-adapter/pgadapter-distroless

    ...

    

  app-spanner:

    depends_on:

      - pgadapter

    command: [ "java", "-jar", "app.jar", "spanner" ]

    ...

```

With that, we configured the connection of JDBC to PGAdapter, by following this [guide](https://github.com/GoogleCloudPlatform/pgadapter/blob/postgresql-dialect/docs/jdbc.md).

Next, we inspected the in memory representation of our query results, since some types were migrated in the Spanner schema. As a reminder, our Spanner schema used `jsonb`s instead of `json`s and `int8`s instead of `int4`s. When handling `json` values, we were using Java `String`s, so we didn’t need any modifications here as `jsonb` objects can be stored in `String`s. On the other hand, when handling `int4` values, we were storing them into memory using the primitive `int` type. This domain is smaller than the new `int8` column type, so we need to make sure we use primitive `long`s instead.

Finally, we had to take a look at the queries we were issuing. The only problem identified was that we are performing a `CAST` from `text` to `json` when inserting the `song_data` column for a [Song](src/main/java/com/google/models/Song.java). Since we are using the `jsonb` type now, we need to modify the casting accordingly.

```sql

-- PostgreSQL

INSERT INTO songs (singer_id, album_id, song_id, song_name, song_data)

VALUES (?, ?, DEFAULT, ?, CAST(? AS JSON))

RETURNING song_id;

-- Updated (Spanner)

INSERT INTO songs (singer_id, album_id, song_id, song_name, song_data)

VALUES (?, ?, DEFAULT, ?, CAST(? AS JSONB))

RETURNING song_id

```

## How to Run

### Open-source PostgreSQL (Cloud SQL)

First set up a Cloud SQL PostgreSQL [instance](https://cloud.google.com/sql/docs/postgres/create-instance), [database](https://cloud.google.com/sql/docs/postgres/create-manage-databases) and [user](https://cloud.google.com/sql/docs/postgres/create-manage-users).

Next copy the `env.sample` into a `.env` file. This will be used by `docker-compose`. In the copied file configure the following information:

```shell

CLOUDSQL_INSTANCE_CONNECTION_NAME=

CLOUDSQL_DATABASE=

CLOUDSQL_USERNAME=

CLOUDSQL_PASSWORD=

```

You can then start the application as follows:

```shell

docker-compose up app-cloudsql

```

You can stop the application like so:

```shell

docker-compose down app-cloudsql

```

### Spanner PostgreSQL

First set up a Spanner [instance](https://cloud.google.com/spanner/docs/create-query-database-console#create-instance) and [PostgreSQL database](https://cloud.google.com/spanner/docs/create-query-database-console#create-database) (don't forget to select the PostgreSQL database dialect).

Next copy the `env.sample` into a `.env` file. This will be used by `docker-compose`. In the copied file configure the following information:

```shell

SPANNER_PROJECT=

SPANNER_INSTANCE=

SPANNER_DATABASE=

```

You can then start the application as follows:

```shell

docker-compose up app-spanner

```

You can stop the application like so:

```shell

docker-compose down pgadapter app-spanner

```
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cloudspannerecosystem/spanner-migration-example

Awesome Lists containing this project

README