https://github.com/cloudspannerecosystem/spanner-cassandra-schema-tool
CLI tool to streamline Cassandra to Spanner schema migration
https://github.com/cloudspannerecosystem/spanner-cassandra-schema-tool
Last synced: 5 months ago
JSON representation
CLI tool to streamline Cassandra to Spanner schema migration
- Host: GitHub
- URL: https://github.com/cloudspannerecosystem/spanner-cassandra-schema-tool
- Owner: cloudspannerecosystem
- License: apache-2.0
- Created: 2025-04-03T18:53:49.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-05-07T22:50:13.000Z (about 1 year ago)
- Last Synced: 2025-05-07T23:29:25.301Z (about 1 year ago)
- Language: Go
- Homepage:
- Size: 155 KB
- Stars: 1
- Watchers: 3
- Forks: 3
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
README
# Spanner Cassandra Schema Tool
This CLI tool streamlines Cassandra to Spanner schema migration by automating the following key steps:
1. **CQL Parsing:** Read Cassandra DDL from a specified CQL file.
2. **CQL Translation:** Translate Cassandra DDL into equivalent Spanner DDL.
3. **Schema Export:** Dump the generated Spanner schema to the `schema.txt` file for review and manual application if needed.
4. **Apply schema to Spanner Database (Optional):** Directly connects to your target Spanner database and applies the generated schema.
[Core conecpts](https://cloud.google.com/spanner/docs/non-relational/spanner-for-cassandra-users#core_concepts) you need to know before using this tool:
* The concept of a Cassandra keyspace is directly equivalent to a Spanner database. Therefore, when converting the DDL, the target Spanner database serves as the implicit keyspace. Imagine that selecting a Spanner database is akin to having created a keyspace with the same name and then executing a USE statement for it. Any keyspace specified within a CQL statement being translated must be identical to the name of the Spanner database you are actively using.
## Note on the DDL Translation
* Only `CREATE TABLE` statements are processed. Other statements are ignored.
* Table options in the `CREATE TABLE` statements are silently ignored.
* Static columns are not supported (syntax error).
* The following data types are not supported (syntax error): `Duration`, `Tuple`, and `Frozen`.
* Mapping of [Cassandra native type](https://cassandra.apache.org/doc/stable/cassandra/cql/types.html#native-types) to [Spanner Type](https://cloud.google.com/spanner/docs/reference/standard-sql/data-types#data_type_list):
| Cassandra | Spanner |
| --------- | ---------- |
| BIGINT | INT64 |
| INT | INT64 |
| SMALLINT | INT64 |
| TINYINT | INT64 |
| TIME | INT64 |
| COUNTER | INT64 |
| ASCII | STRING(MAX)|
| TEXT | STRING(MAX)|
| VARCHAR | STRING(MAX)|
| INET | STRING(MAX)|
| UUID | STRING(MAX)|
| TIMEUUID | STRING(MAX)|
| BOOLEAN | BOOL |
| TIMESTAMP | TIMESTAMP |
| DATE | DATE |
| FLOAT | FLOAT32 |
| DOUBLE | FLOAT64 |
| DECIMAL | NUMERIC |
| VARINT | NUMERIC |
| BLOB | BYTES(MAX) |
* Mapping of [Cassandra collection type](https://cassandra.apache.org/doc/stable/cassandra/cql/types.html#collections) to [Spanner Type](https://cloud.google.com/spanner/docs/reference/standard-sql/data-types#data_type_list):
| Cassandra | Spanner |
| ---------- | ------------ |
| LIST\ | ARRAY\ |
| SET\ | ARRAY\ |
| MAP | JSON |
* The original Cassandra type will be added to the options in each column definition in the translated `CREATE TABLE` statement. See the example section below for details.
* Cassandra (composite) partition key and clustering columns are combined to form the composite primary key in Spanner. See the example section below for details.
* The translation process primarily focuses on **data type** and **primary key clause** conversion. It **does not** gurantee comprehensive semantic validation of the DDL.
## Requirements
- **Go**: Ensure that Go is installed on your system. The required go version is 1.23+.
- **Google Cloud SDK**: Ensure `gcloud` is installed and authenticated with proper permissions.
- **Spanner Database**: Ensure the target Spanner database is created.
## Setup
### Clone the Repository
```bash
git clone https://github.com/cloudspannerecosystem/spanner-cassandra-schema-tool.git
cd spanner-cassandra-schema-tool
```
### Install Dependencies
Ensure that all necessary Go modules are installed:
```bash
go mod download
```
### Set Up Google Cloud Credentials
This tool uses [Application Default Credentials](https://cloud.google.com/docs/authentication/production?hl=en#providing_credentials_to_your_application) as the credential source for connecting to Spanner databases. Set the `GOOGLE_APPLICATION_CREDENTIALS` environment variable to the path of your service account key file.
```bash
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/service-account-file.json"
```
## Usage
### SYNOPSIS
```bash
go run schema_converter.go \
--project \
--instance \
--database \
--cql \
[--dry-run]
```
- `--project`: Google Cloud project ID.
- `--instance`: Spanner instance ID.
- `--database`: Spanner database ID.
- `--cql`: Path to the CQL file containing DDL statements.
- `[--dry-run]`: Output the converted DDL statements without applying to the Spanner database. This flag is optional.
### EXAMPLE
#### Command
```bash
go run schema_converter.go \
--project cassandra-to-spanner \
--instance spanner-instance-dev \
--database testdb \
--cql /path/to/cql-file.cql \
--dry-run
```
#### Cassandra DDL in the input CQL file
```sql
CREATE TABLE IF NOT EXISTS testdb.example_table (
id UUID PRIMARY KEY,
ascii_value ASCII,
bigint_value BIGINT,
blob_value BLOB,
boolean_value BOOLEAN,
decimal_value DECIMAL,
double_value DOUBLE,
float_value FLOAT,
inet_value INET,
int_value INT,
smallint_value SMALLINT,
text_value TEXT,
timestamp_value TIMESTAMP,
timeuuid_value TIMEUUID,
tinyint_value TINYINT,
varchar_value VARCHAR,
varint_value VARINT,
list_value LIST,
map_value MAP,
set_value SET,
date_value DATE,
time_value TIME
);
CREATE TABLE IF NOT EXISTS sensor_data (
sensor_id UUID,
reading_time TIMESTAMP,
location TEXT,
temperature DECIMAL,
humidity DECIMAL,
pressure DECIMAL,
PRIMARY KEY ((sensor_id, location), reading_time)
);
```
#### Spanner DDL in the output `schema.txt` file
```sql
CREATE TABLE IF NOT EXISTS example_table (
id STRING(MAX) NOT NULL OPTIONS (cassandra_type = 'uuid'),
ascii_value STRING(MAX) OPTIONS (cassandra_type = 'ascii'),
bigint_value INT64 OPTIONS (cassandra_type = 'bigint'),
blob_value BYTES(MAX) OPTIONS (cassandra_type = 'blob'),
boolean_value BOOL OPTIONS (cassandra_type = 'boolean'),
decimal_value NUMERIC OPTIONS (cassandra_type = 'decimal'),
double_value FLOAT64 OPTIONS (cassandra_type = 'double'),
float_value FLOAT32 OPTIONS (cassandra_type = 'float'),
inet_value STRING(MAX) OPTIONS (cassandra_type = 'inet'),
int_value INT64 OPTIONS (cassandra_type = 'int'),
smallint_value INT64 OPTIONS (cassandra_type = 'smallint'),
text_value STRING(MAX) OPTIONS (cassandra_type = 'text'),
timestamp_value TIMESTAMP OPTIONS (cassandra_type = 'timestamp'),
timeuuid_value STRING(MAX) OPTIONS (cassandra_type = 'timeuuid'),
tinyint_value INT64 OPTIONS (cassandra_type = 'tinyint'),
varchar_value STRING(MAX) OPTIONS (cassandra_type = 'varchar'),
varint_value NUMERIC OPTIONS (cassandra_type = 'varint'),
list_value ARRAY OPTIONS (cassandra_type = 'list'),
map_value JSON OPTIONS (cassandra_type = 'map'),
set_value ARRAY OPTIONS (cassandra_type = 'set'),
date_value DATE OPTIONS (cassandra_type = 'date'),
time_value INT64 OPTIONS (cassandra_type = 'time'),
) PRIMARY KEY (id);
CREATE TABLE IF NOT EXISTS sensor_data (
sensor_id STRING(MAX) NOT NULL OPTIONS (cassandra_type = 'uuid'),
reading_time TIMESTAMP NOT NULL OPTIONS (cassandra_type = 'timestamp'),
location STRING(MAX) NOT NULL OPTIONS (cassandra_type = 'text'),
temperature NUMERIC OPTIONS (cassandra_type = 'decimal'),
humidity NUMERIC OPTIONS (cassandra_type = 'decimal'),
pressure NUMERIC OPTIONS (cassandra_type = 'decimal'),
) PRIMARY KEY (sensor_id, location, reading_time);
```