https://github.com/cloudspannerecosystem/spanner-table-copy-pipeline
https://github.com/cloudspannerecosystem/spanner-table-copy-pipeline
Last synced: 5 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/cloudspannerecosystem/spanner-table-copy-pipeline
- Owner: cloudspannerecosystem
- License: apache-2.0
- Created: 2022-01-21T09:40:33.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2024-10-21T08:14:06.000Z (over 1 year ago)
- Last Synced: 2024-10-21T11:03:17.228Z (over 1 year ago)
- Language: Java
- Size: 47.9 KB
- Stars: 2
- Watchers: 34
- Forks: 3
- Open Issues: 12
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# Spanner Table Copy
Beam pipeline to run a query on a Spanner database and write the results to a
spanner table.
This pipeline can be used for various functions to transform/update a database,
without needing to be concerned about transaction mutation limitations
* Transforming read data using SQL and writing the results back, e.g. the
equivalent of the following pseudo-sql:
```sql
INSERT OR UPDATE INTO (key, value)
(
SELECT key, value*100 FROM WHERE
)
```
* Point-in-time recovery by reading the database state at a point in the past
(within the Spanner database's
version retention
period, and then writing it back. e.g. the equivalent of the following
pseudo-sql:
```sql
INSERT OR UPDATE INTO (key, value)
(
SELECT key, value
FROM FOR SYSTEM_TIME AS OF
WHERE
)
```
* Copying data from one database to another, e.g. the equivalent of the
following pseudo-sql:
```sql
INSERT OR UPDATE INTO (key, value)
(
SELECT key, value FROM
)
```
## Usage:
### Show help text:
```
mvn compile exec:java -Dexec.mainClass=com.google.cloud.solutions.SpannerTableCopy
-Dexec.args='--help=com.google.cloud.solutions.SpannerTableCopy$SpannerTableCopyOptions'
```
### Execute the pipeline:
For example copying a table at a timestamp in the past to a different database:
```
mvn compile exec:java
-Dexec.mainClass=com.google.cloud.solutions.SpannerTableCopy \
-Pdataflow-runner
-Dexec.args="
--runner=DataflowRunner
--sourceProjectId=SOURCE_PROJECT
--sourceInstanceId=SOURCE_INSTANCE
--sourceDatabaseId=SOURCE_DATABASE
--sqlQuery='select * from SOURCE_TABLE'
--readTimestamp=2022-01-01T12:00:00Z
--destinationProjectId=DEST_PROJECT
--destinationInstanceId=DEST_INSTANCE
--destinationDatabaseId=DEST_DATABASE
--destinationTable=DEST_TABLE
--writeMode=WRITE_MODE
--mutationReportFile=gs://BUCKET/PATH/report
--failureLogFile=gs://BUCKET/PATH/failures
"
```
More examples can be found in the `SpannerTableCopyIntegrationTest`
Note: take care of Bash quoting with `-Dexec.args` and the `--sqlQuery` value.
The `--dryRun` parameter can be used to create the mutation report file without
actually writing to the database.
`--writeMode` values correspond to the values of the
[Mutation.Op enum](https://googleapis.dev/java/google-cloud-spanner/latest/com/google/cloud/spanner/Mutation.Op.html)