Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/squidmin/bigquery-labs
GCP BigQuery CLI
https://github.com/squidmin/bigquery-labs
bigquery gcp java
Last synced: about 2 months ago
JSON representation
GCP BigQuery CLI
- Host: GitHub
- URL: https://github.com/squidmin/bigquery-labs
- Owner: squidmin
- Created: 2023-04-02T16:59:31.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-06-04T01:53:44.000Z (over 1 year ago)
- Last Synced: 2024-10-28T08:14:06.971Z (3 months ago)
- Topics: bigquery, gcp, java
- Language: Java
- Homepage:
- Size: 172 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# bigquery-labs
GCP BigQuery labs using the Java SDK.
About
Made with:
- **Adoptium Temurin OpenJDK 11.0.17**
- **Spring Boot v2.4.3**
- **Apache Maven 3.8.7**
- **IntelliJ IDEA 2023.1 (Ultimate Edition)**
- **GoLand 2023.1 (Ultimate Edition)**
- **PyCharm 2023.1 (Ultimate Edition)**---
## Install & build
Download, install, and initialize the gcloud SDK on your local machine
Refer to the `gcloud` CLI documentation to complete this step.
Install the `gcloud` SDK to the user's home directory (e.g., `/Users/USERNAME/google-cloud-sdk`).
When it's finished installing, add the `gcloud` executable to your system's `$PATH` and run the command:
```shell
gcloud init
```gcloud CLI: Generate an Application Default Credentials (ADC) access token
If you're running the application locally, you can use the following command to generate an access token:
```shell
export GCP_ADC_ACCESS_TOKEN="$(gcloud auth application-default print-access-token)"
```gcloud CLI: Generate a GCP service account access token
Run this command to generate an access token for a specific GCP service account:
```shell
export GCP_SA_ACCESS_TOKEN=$(gcloud auth print-access-token --impersonate-service-account='GCP_SA_EMAIL_ADDRESS')
```**Replace the following**:
- `GCP_SA_EMAIL_ADDRESS`: the email address of the service account to impersonate.Example:
```shell
export GCP_SA_ACCESS_TOKEN=$(gcloud auth print-access-token --impersonate-service-account='[email protected]')
```Create and store a service account key
This section refers to usage of a GCP service account key (.json) file stored on your local file system.
To map a local `gcloud` installation to a volume on a container instance running the application, include the `-v` parameter in the `docker run` command used to start a container instance, as described below.
### macOS
Assuming the user's service account key file is stored in the same directory as their local `gcloud` installation:
`/Users/USERNAME/.config/gcloud`
```shell
export LOCAL_GCLOUD_AUTH_DIRECTORY=$HOME/.config/gcloud
```and the target volume on the container instance is:
`/root/.config/gcloud`
```shell
export CONTAINER_GCLOUD_AUTH_DIRECTORY=/root/.config/gcloud
```the command to run the container instance would be:
```shell
docker run --rm -it \
-e GCP_SA_KEY_PATH=$GCP_SA_KEY_PATH \
-e GCP_ADC_ACCESS_TOKEN=$GCP_ADC_ACCESS_TOKEN \
-e GCP_SA_ACCESS_TOKEN=$GCP_SA_ACCESS_TOKEN \
-e GCP_DEFAULT_USER_PROJECT_ID=$GCP_DEFAULT_USER_PROJECT_ID \
-e GCP_DEFAULT_USER_DATASET=$GCP_DEFAULT_USER_DATASET \
-e GCP_DEFAULT_USER_TABLE=$GCP_DEFAULT_USER_TABLE \
-e GCP_SA_PROJECT_ID=$GCP_SA_PROJECT_ID \
-e GCP_SA_DATASET=$GCP_SA_DATASET \
-e GCP_SA_TABLE=$GCP_SA_TABLE \
-v ${LOCAL_GCLOUD_AUTH_DIRECTORY}:${CONTAINER_GCLOUD_AUTH_DIRECTORY} \
-v ${LOCAL_MAVEN_REPOSITORY}:${CONTAINER_MAVEN_REPOSITORY} \
bigquery-labs
```**Replace the following** in the path to the `gcloud` directory:
- `USERNAME`: the current OS user's username
so that the path to the service account key file is correct, e.g.:
`/Users/squidmin/.config/gcloud/sa-private-key.json`
Read here for more information about creating service account keys.
Read here for more information about run config CLI arguments.
Add the Maven wrapper
Ensure that Maven is already installed on the machine that will run the container.
In the root of this project, run the command:
```shell
mvn wrapper:wrapper
```Read here for more information about installing Maven.
Build a JAR
```shell
./mvnw clean package -P integration \
-DGCP_SA_KEY_PATH=$GCP_SA_KEY_PATH \
-DGCP_ADC_ACCESS_TOKEN=$GCP_ADC_ACCESS_TOKEN \
-DGCP_SA_ACCESS_TOKEN=$GCP_SA_ACCESS_TOKEN \
-DGCP_DEFAULT_USER_PROJECT_ID=$GCP_DEFAULT_USER_PROJECT_ID \
-DGCP_DEFAULT_USER_DATASET=$GCP_DEFAULT_USER_DATASET \
-DGCP_DEFAULT_USER_TABLE=$GCP_DEFAULT_USER_TABLE \
-DGCP_SA_PROJECT_ID=$GCP_SA_PROJECT_ID \
-DGCP_SA_DATASET=$GCP_SA_DATASET \
-DGCP_SA_TABLE=$GCP_SA_TABLE
```Or use `mvn clean install`:
```shell
./mvnw clean install -P integration \
-DGCP_SA_KEY_PATH=$HOME/.config/gcloud \
-DGCP_ADC_ACCESS_TOKEN="$(gcloud auth application-default print-access-token)" \
-DGCP_SA_ACCESS_TOKEN=placeholder \
-DGCP_DEFAULT_USER_PROJECT_ID=lofty-root-378503 \
-DGCP_DEFAULT_USER_DATASET=test_dataset_integration \
-DGCP_DEFAULT_USER_TABLE=test_table_integration \
-DGCP_SA_PROJECT_ID=placeholder \
-DGCP_SA_DATASET=placeholder \
-DGCP_SA_TABLE=placeholder
```Read here for more information about run config CLI arguments.
Build a container image
```shell
docker build \
--build-arg GCP_SA_KEY_PATH=${GCP_SA_KEY_PATH} \
--build-arg GCP_DEFAULT_USER_PROJECT_ID=${GCP_DEFAULT_USER_PROJECT_ID} \
--build-arg GCP_DEFAULT_USER_DATASET=${GCP_DEFAULT_USER_DATASET} \
--build-arg GCP_DEFAULT_USER_TABLE=${GCP_DEFAULT_USER_TABLE} \
--build-arg GCP_SA_PROJECT_ID=${GCP_SA_PROJECT_ID} \
--build-arg GCP_SA_DATASET=${GCP_SA_DATASET} \
--build-arg GCP_SA_TABLE=${GCP_SA_TABLE} \
-t bigquery-labs .
```Read here for more information about run config CLI arguments.
---
## Run the application
### 1a. Prepare the host machine environment
Expand
Until the main application entrypoint is developed, run the application's functionality via the `mvn test` interface.
Pass required environment variables on your local system to the VM options, as shown in the `createTableWithCustomSchema` example below:
```shell
./mvnw \
-Dtest=BigQueryAdminClientIntegrationTest#listDatasets \
test -P integration \
-DGCP_SA_KEY_PATH=$GCP_SA_KEY_PATH \
-DGCP_ADC_ACCESS_TOKEN=$GCP_ADC_ACCESS_TOKEN \
-DGCP_SA_ACCESS_TOKEN=$GCP_SA_ACCESS_TOKEN \
-DGCP_DEFAULT_USER_PROJECT_ID=$GCP_DEFAULT_USER_PROJECT_ID \
-DGCP_DEFAULT_USER_DATASET=$GCP_DEFAULT_USER_DATASET \
-DGCP_DEFAULT_USER_TABLE=$GCP_DEFAULT_USER_TABLE
```### 1b. Prepare the container environment
Expand
### Environment variables
The `GCP_SA_KEY_PATH` environment variable is used to store the path to the user's GCP service account key file.
The user's service account key file is mapped to the `/root/.config/gcloud` directory on the container instance.```shell
export GCP_SA_KEY_PATH=/root/.config/gcloud/sa-private-key.json
```The `GCP_ADC_ACCESS_TOKEN` environment variable is used to store an OAuth2 access token for reaching BigQuery RESTful services _using Application Default Credentials_ (ADC).
```shell
export GCP_ADC_ACCESS_TOKEN=$(gcloud auth application-default print-access-token)
```The `GCP_SA_ACCESS_TOKEN` environment variable is used to store an OAuth2 access token for reaching BigQuery RESTful services _as a specific service account_.
```shell
export GCP_SA_ACCESS_TOKEN=$(gcloud auth print-access-token --impersonate-service-account="[email protected]")
```### Utility script (WIP)
The `run.sh` script at the root level of the project will set required environment variables automatically.
It accepts short and long arguments for each environment variable.
### 2. Run an interactive container instance
Expand
### `docker run`
```shell
docker run --rm -it \
-e GCP_SA_KEY_PATH=$GCP_SA_KEY_PATH \
-e GCP_ADC_ACCESS_TOKEN=$GCP_ADC_ACCESS_TOKEN \
-e GCP_SA_ACCESS_TOKEN=$GCP_SA_ACCESS_TOKEN \
-e GCP_DEFAULT_USER_PROJECT_ID=$GCP_DEFAULT_USER_PROJECT_ID \
-e GCP_DEFAULT_USER_DATASET=$GCP_DEFAULT_USER_DATASET \
-e GCP_DEFAULT_USER_TABLE=$GCP_DEFAULT_USER_TABLE \
-e GCP_SA_PROJECT_ID=$GCP_SA_PROJECT_ID \
-e GCP_SA_DATASET=$GCP_SA_DATASET \
-e GCP_SA_TABLE=$GCP_SA_TABLE \
-v ${LOCAL_GCLOUD_AUTH_DIRECTORY}:${CONTAINER_GCLOUD_AUTH_DIRECTORY} \
-v ${LOCAL_MAVEN_REPOSITORY}:${CONTAINER_MAVEN_REPOSITORY} \
bigquery-labs
```Read here for more information about run config CLI arguments.
---
### `./run.sh` utility script
```shell
./run.sh \
-dpid $GCP_DEFAULT_USER_PROJECT_ID \
-sakp $HOME/.config/gcloud/sa-private-key.json \
-saat access_token_placeholder \
-adcat $(gcloud auth application-default print-access-token)
```Or use long arguments:
```shell
./run.sh \
--GCP_DEFAULT_USER_PROJECT_ID $GCP_DEFAULT_USER_PROJECT_ID \
--GCP_SA_KEY_PATH $HOME/.config/gcloud/sa-private-key.json \
--GCP_SA_ACCESS_TOKEN access_token_placeholder \
--GCP_ADC_ACCESS_TOKEN $(gcloud auth application-default print-access-token)
```### `run.sh` options
Expand
`run.sh` implements the following options:
- `--default`: Start the application on the user's host system with default run environment settings.
Example```shell
./run.sh --default
```
- `-ci`, `--container-instance`: Run a container instance pointing to the root directory of the application.
Example```shell
./run.sh --container-instance
```
- `-nci`, `--no-container-instance`: Build and run the application without starting a container instance.
Example```shell
./run.sh -nci
```
- `-i GCP_SA_EMAIL_ADDRESS`, `--impersonate GCP_SA_EMAIL_ADDRESS`: Impersonate a GCP service account.
Replace `GCP_SA_EMAIL_ADDRESS` with the email address of the service account to impersonate.
Example```shell
./run.sh --default -ci --impersonate-service-account GCP_SA_EMAIL_ADDRESS
```**Replace the following**:
- `GCP_SA_EMAIL_ADDRESS`: the impersonated service account's email address.
### Other examples
Build JAR, build image, & start an interactive container instance with default run environment settings
```shell
./run.sh --default --container-instance
``````shell
./run.sh --default -ci
```Build and run JAR with default run environment settings
```shell
./run.sh --default
``````shell
./run.sh --default --no-container-instance
``````shell
./run.sh --default -nci
```### 3. Run the JAR
> Note: This section currently is only used for testing the main application entrypoint.
>
> This project currently invokes the BigQuery Java SDK (and later the BigQuery RESTful services) via the `mvn test` interface. Use that for now.
>
> _**Refer to**_: `/src/test` README.md.Using "exec java" command. Specify a profile.
```shell
exec java -jar \
-Dspring.profiles.active=$PROFILE \
-DGCP_SA_KEY_PATH=$GCP_SA_KEY_PATH \
-DGCP_ADC_ACCESS_TOKEN=$GCP_ADC_ACCESS_TOKEN \
-DGCP_SA_ACCESS_TOKEN=$GCP_SA_ACCESS_TOKEN \
-DGCP_DEFAULT_USER_PROJECT_ID=$GCP_DEFAULT_USER_PROJECT_ID \
-DGCP_DEFAULT_USER_DATASET="test_dataset_integration" \
-DGCP_DEFAULT_USER_TABLE="test_table_integration_custom" \
./target/bigquery-labs-0.0.1-SNAPSHOT.jar
```With Maven. Specify a profile.
```shell
mvn spring-boot:run \
-Dspring-boot.run.profiles=$PROFILE \
-DGCP_SA_KEY_PATH=$GCP_SA_KEY_PATH \
-DGCP_ADC_ACCESS_TOKEN=$GCP_ADC_ACCESS_TOKEN \
-DGCP_SA_ACCESS_TOKEN=$GCP_SA_ACCESS_TOKEN \
-DGCP_DEFAULT_USER_PROJECT_ID=$GCP_DEFAULT_USER_PROJECT_ID \
-DGCP_DEFAULT_USER_DATASET="test_dataset_integration" \
-DGCP_DEFAULT_USER_TABLE="test_table_integration_custom"
```---
## Table admin
### `bq` CLI
List datasets
```shell
bq ls --filter labels.key:value \
--max_results INTEGER \
--format=prettyjson \
--project_id PROJECT_ID
```**Replace the following**:
- `key:value`: a label key and value, if applicable.
- `INTEGER`: an integer representing the number of datasets to list.
- `PROJECT_ID`: the name of the GCP project containing the datasets to list.**Examples**:
```shell
bq ls --format=pretty
```Create a dataset
Refer to the GCP documentation for creating datasets.
**Examples**:
```shell
bq --location=us mk \
--dataset \
--default_partition_expiration=3600 \
--default_table_expiration=3600 \
--description="An example." \
--label=test_label_1:test_value_1 \
--label=test_label_2:test_value_2 \
--max_time_travel_hours=168 \
--storage_billing_model=LOGICAL \
${GCP_DEFAULT_USER_PROJECT_ID}:${GCP_DEFAULT_USER_DATASET}
```The Cloud Key Management Service (KMS) key parameter (`KMS_KEY_NAME`) can be specified.
This parameter is used to pass the name of the default Cloud Key Management Service key used to protect newly created tables in this dataset.
You cannot create a Google-encrypted table in a dataset with this parameter set.```shell
bq --location=us mk \
--dataset \
--default_kms_key=KMS_KEY_NAME \
... \
${GCP_DEFAULT_USER_PROJECT_ID}:${GCP_DEFAULT_USER_DATASET}
```Delete a dataset
Refer to the GCP documentation for deleting a dataset.
#### Examples:
Remove all tables in the dataset (`-r` flag):
```shell
bq rm -r -f -d ${GCP_DEFAULT_USER_PROJECT_ID}:${GCP_DEFAULT_USER_DATASET}
```Create a table with a configured schema
**Create an empty table with an inline schema definition**
```shell
bq mk --table PROJECT_ID:DATASET.TABLE SCHEMA
```**Replace the following**:
- `PROJECT_ID`: the name of the GCP project to target.
- `DATASET`: the name of the BigQuery dataset to target.
- `TABLE`: the name of the BigQuery table to target.
- `SCHEMA`: an inline schema definition.Example:
```shell
bq mk --table \
${GCP_DEFAULT_USER_PROJECT_ID}:${GCP_DEFAULT_USER_DATASET}.test_table_name_lofty \
id:STRING,creation_timestamp:DATETIME,last_update_timestamp:DATETIME,column_a:STRING,column_b:BOOL
```### Specify the schema in a JSON schema file
For an example JSON schema file, refer to: `/schema/example.json`.
**Create an empty table**
```shell
bq mk --table \
PROJECT_ID:DATASET.TABLE \
./path/to/schema/file.json
```Example:
```shell
bq mk --table \
${GCP_DEFAULT_USER_PROJECT_ID}:${GCP_DEFAULT_USER_DATASET}.test_table_name_lofty \
./schema/example.json
```**Create a table with CSV data**
```shell
bq --location=location load \
--source_format=format \
PROJECT_ID:DATASET.TABLE \
./path/to/data/file.csv \
./path/to/schema/file.json
```Example:
```shell
bq --location=us load \
--source_format=CSV \
${GCP_DEFAULT_USER_PROJECT_ID}:${GCP_DEFAULT_USER_DATASET}.test_table_name_lofty \
./csv/example.csv \
./schema/example.json
```Refer to the BigQuery documentation: Details of loading CSV data.
Delete a table
```shell
bq rm --table ${GCP_DEFAULT_USER_PROJECT_ID}:${GCP_DEFAULT_USER_DATASET}.test_table_name_lofty
```Show table schema
Example:
```shell
bq show \
--schema \
--format=prettyjson \
${GCP_DEFAULT_USER_PROJECT_ID}:${GCP_DEFAULT_USER_DATASET}.test_table_name_lofty
```The table schema can be written to a file:
```shell
bq show \
--schema \
--format=prettyjson \
${GCP_DEFAULT_USER_PROJECT_ID}:${GCP_DEFAULT_USER_DATASET}.test_table_name_lofty \ > ./schema/example_show-write.json
```Modify table schemas
```shell
bq update \
${GCP_DEFAULT_USER_PROJECT_ID}:${GCP_DEFAULT_USER_DATASET}.test_table_name_lofty \
./schema/example_update.json
```Refer to the GCP documentation on modifying table schemas..
Insert data into a table
**Examples**:
Insert for known values:
```shell
bq insert ${GCP_DEFAULT_USER_PROJECT_ID}:${GCP_DEFAULT_USER_DATASET}.test_table_name_lofty ./json/example.json
```Specify a template suffix (`--template_suffix` or `-x`):
```shell
bq insert --ignore_unknown_values \
--template_suffix=_insert \
${GCP_DEFAULT_USER_PROJECT_ID}:${GCP_DEFAULT_USER_DATASET}.test_table_name_lofty \
./json/example.json
```Refer to the `bq insert` documentation.
Run an interactive query
```shell
bq query \
--use_legacy_sql=false \
'query_string'
```Example:
```shell
bq query \
--use_legacy_sql=false \
'SELECT
id, column_b
FROM
`lofty-root-378503.test_dataset_name_lofty.test_table_name_lofty`
LIMIT
3;'
```