Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sangnandar/load-csvs-from-gcs-to-bigquery
Google Apps Script to streamline loading CSV data from Google Cloud Storage (GCS) into BigQuery.
https://github.com/sangnandar/load-csvs-from-gcs-to-bigquery
bigquery csv-import google-apps-script google-cloud-storage
Last synced: 20 days ago
JSON representation
Google Apps Script to streamline loading CSV data from Google Cloud Storage (GCS) into BigQuery.
- Host: GitHub
- URL: https://github.com/sangnandar/load-csvs-from-gcs-to-bigquery
- Owner: sangnandar
- License: mit
- Created: 2025-01-09T19:29:03.000Z (24 days ago)
- Default Branch: main
- Last Pushed: 2025-01-09T19:46:47.000Z (24 days ago)
- Last Synced: 2025-01-09T20:28:16.524Z (23 days ago)
- Topics: bigquery, csv-import, google-apps-script, google-cloud-storage
- Language: JavaScript
- Homepage: https://script.google.com
- Size: 0 Bytes
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Load CSVs from GCS to BigQuery
# Overview
This project is a **Google Apps Script solution** designed to streamline loading CSV data from **Google Cloud Storage (GCS)** into **BigQuery**. The solution ensures data integrity by validating records before they are inserted into the database.
### Key Features:
- **Data Validation:** Records are loaded into BigQuery only after passing defined validation criterias.
- **Google Sheets UI:** The project uses Google Sheets with a bounded Apps Script to provide a user-friendly interface for interacting with BigQuery operations.
- **GCS-BigQuery Pipeline:** The project utilizes **GCS** `gsUri` for seamless integration with **BigQuery**.This solution is ideal for handling large datasets, ensuring data quality, and leveraging BigQuery's robust analytics capabilities.
# How it works
- CSVs are stored in GCS bucket under the `csv/` folder.
- Apps Script reads the content of that folder to list all csv files.
- Apps Script calls BigQuery's stored procedure to load each file into BigQuery table.
- Successful files are moved to `success/` subfolder.
- Unsuccessful files are moved to `fail/` subfolder and logged into Google Sheets.# Installation
### GCP Project configuration
These APIs should be enabled:
- Cloud Storage API
- BigQuery API### BigQuery configuration
- In this example, only 2 date columns and 1 integer column are validated. Additional validations can be incorporated into the `loadCsvToBQ` stored procedure.
- In this example, the date columns originate in the MM/DD/YY format and the integer column originates as a string with thousand separator.
- BigQuery temporary table must include `_SESSION` because when `LOAD INTO` fails the temporary table will not be created and subsequent `DROP TABLE` will throw error.
- Target table structure:![image](https://github.com/user-attachments/assets/8b8682b4-55ad-42eb-bf77-6eb64715bd19)
### Apps Script configuration
- Deploy the project as webapp.
- Set up Script Properties in **Apps Script -> Project Settings -> Script Properties**:
```
{
PROJECT_ID: ,
WEBAPP_URL: ,
BUCKET_NAME: ,
DATASET:
}
```
- Configure the `appsscript.json` file:
```json
{
"dependencies": {
"enabledAdvancedServices": [
{
"userSymbol": "BigQuery",
"version": "v2",
"serviceId": "bigquery"
}
]
},
"webapp": {
"executeAs": "USER_DEPLOYING",
"access": "ANYONE_ANONYMOUS"
},
"oauthScopes": [
"https://www.googleapis.com/auth/spreadsheets",
"https://www.googleapis.com/auth/script.external_request",
"https://www.googleapis.com/auth/bigquery",
"https://www.googleapis.com/auth/devstorage.read_only"
]
}
```### Sheets configuration
**DO NOT** change sheets name, delete columns, or re-arrange columns for the following ranges:
- Write
```
'Files'!A2:A
```Sheets layout:
![image](https://github.com/user-attachments/assets/02372627-d54b-4516-82be-46f48d17fab0)
# Usage
- Attach the Apps Script to a Google Sheets.
- Create the related tables, procedures, and functions in BigQuery.
- Call the script from **Menu toolbar: Custom Menu -> Load CSV to BigQuery**.# Caveats
Apps Script [current limitations](https://developers.google.com/apps-script/guides/services/quotas#current_limitations) specify a "Simultaneous executions per user" limit of 30. This restriction impacts the number of requests that can be made using `UrlFetchApp.fetchAll()`. In this project, the requests are split into chunks of 25 to stay within the limit.# TODO
- To overcome this limitation, consider creating a stored procedure to act as a broker. This procedure would accept an array of `gsUri` values and invoke the `loadCsvToBQ` procedure for each individual `gsUri`.
- Create a separate repository for the same use case, but utilize Google Drive as the storage solution instead of GCS.# Related project
- [CSV Fixer for GCS](https://github.com/sangnandar/CSV-Fixer-for-GCS) - fix the file and upload back to GCS.