https://github.com/cloudspannerecosystem/spanner-table-copy-pipeline

Last synced: 5 months ago
JSON representation

Host: GitHub
URL: https://github.com/cloudspannerecosystem/spanner-table-copy-pipeline
Owner: cloudspannerecosystem
License: apache-2.0
Created: 2022-01-21T09:40:33.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2024-10-21T08:14:06.000Z (over 1 year ago)
Last Synced: 2024-10-21T11:03:17.228Z (over 1 year ago)
Language: Java
Size: 47.9 KB
Stars: 2
Watchers: 34
Forks: 3
Open Issues: 12
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE

Awesome Lists containing this project

README

          # Spanner Table Copy

Beam pipeline to run a query on a Spanner database and write the results to a

spanner table.

This pipeline can be used for various functions to transform/update a database,

without needing to be concerned about transaction mutation limitations

* Transforming read data using SQL and writing the results back, e.g. the

  equivalent of the following pseudo-sql:

```sql

  INSERT OR UPDATE INTO  (key, value)

  (

     SELECT key, value*100 FROM  WHERE 

  )

```

* Point-in-time recovery by reading the database state at a point in the past

  (within the Spanner database's

  version retention

  period, and then writing it back. e.g. the equivalent of the following

  pseudo-sql:

```sql

  INSERT OR UPDATE INTO  (key, value)

  (

     SELECT key, value

     FROM  FOR SYSTEM_TIME AS OF 

     WHERE 

  )

```

* Copying data from one database to another, e.g. the equivalent of the

  following pseudo-sql:

```sql

 INSERT OR UPDATE INTO  (key, value)

 (

    SELECT key, value FROM 

 )

```

## Usage:

### Show help text:

```

mvn compile exec:java -Dexec.mainClass=com.google.cloud.solutions.SpannerTableCopy

    -Dexec.args='--help=com.google.cloud.solutions.SpannerTableCopy$SpannerTableCopyOptions'

```

### Execute the pipeline:

For example copying a table at a timestamp in the past to a different database:

```

mvn compile exec:java

    -Dexec.mainClass=com.google.cloud.solutions.SpannerTableCopy \

    -Pdataflow-runner

    -Dexec.args="

         --runner=DataflowRunner

         --sourceProjectId=SOURCE_PROJECT

         --sourceInstanceId=SOURCE_INSTANCE

         --sourceDatabaseId=SOURCE_DATABASE

         --sqlQuery='select * from SOURCE_TABLE'

         --readTimestamp=2022-01-01T12:00:00Z

         --destinationProjectId=DEST_PROJECT

         --destinationInstanceId=DEST_INSTANCE

         --destinationDatabaseId=DEST_DATABASE

         --destinationTable=DEST_TABLE

         --writeMode=WRITE_MODE

         --mutationReportFile=gs://BUCKET/PATH/report

         --failureLogFile=gs://BUCKET/PATH/failures

    "

```

More examples can be found in the `SpannerTableCopyIntegrationTest`

Note: take care of Bash quoting with `-Dexec.args` and the `--sqlQuery` value.

The `--dryRun` parameter can be used to create the mutation report file without

actually writing to the database.

`--writeMode` values correspond to the values of the

[Mutation.Op enum](https://googleapis.dev/java/google-cloud-spanner/latest/com/google/cloud/spanner/Mutation.Op.html)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cloudspannerecosystem/spanner-table-copy-pipeline

Awesome Lists containing this project

README