https://github.com/memeddev/mysql-to-google-bigquery
MySQL to Google BigQuery Sync Tool
https://github.com/memeddev/mysql-to-google-bigquery
Last synced: 12 months ago
JSON representation
MySQL to Google BigQuery Sync Tool
- Host: GitHub
- URL: https://github.com/memeddev/mysql-to-google-bigquery
- Owner: MemedDev
- License: mit
- Created: 2016-10-25T17:35:44.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2023-04-19T18:31:33.000Z (about 3 years ago)
- Last Synced: 2025-05-15T20:11:32.730Z (about 1 year ago)
- Language: PHP
- Size: 66.4 KB
- Stars: 62
- Watchers: 7
- Forks: 26
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README

MySQL to Google BigQuery Sync Tool
## Table of Contents
+ [How it works](#how-it-works)
+ [Requirements](#requirements)
+ [Usage](#usage)
+ [Credits](#credits)
+ [License](#license)
## How it works
Steps when no order column has been supplied:
+ Count MySQL table rows
+ Count BigQuery table rows
+ MySQL rows > BigQuery rows?
+ Get the rows diff, split in batches of XXXXX rows/batch
Steps when order column has been supplied:
+ Get max value for order column from MySQL table
+ Get max value for order column from BigQuery table
+ Max value MySQL > Max value BigQuery?
+ Delete all rows with order column value = max value BigQuery
to make sure no duplicate records are being created in BigQuery
+ Get max value for order column from BigQuery table
+ Get the rows diff based on new max value BigQuery,
split in batches of XXXXX rows/batch
Final three steps:
+ Dump MySQL rows to a JSON
+ Send JSON to BigQuery
+ Repeat until all batches are sent
Tip: Create a cron job for keep syncing the tables using an interval like 15 minutes (respect the Load Jobs [quota policy](https://cloud.google.com/bigquery/quota-policy))
## Requirements
The following PHP versions are supported:
+ PHP 7
+ HHVM
+ PDO Extension with MySQL driver
## Usage
Download the library using [composer](https://packagist.org/packages/memeddev/mysql-to-google-bigquery):
```bash
$ composer require memeddev/mysql-to-google-bigquery
```
Now, define some environment variables or create a `.env` file on the root of the project, replacing the values:
```text
BQ_PROJECT_ID=bigquery-project-id
BQ_KEY_FILE=google-service-account-json-key-file.json
BQ_DATASET=bigquery-dataset-name
DB_DATABASE_NAME=mysql-database-name
DB_USERNAME=mysql_username
DB_PASSWORD=mysql_password
DB_HOST=mysql-host
DB_PORT=3306
IGNORE_COLUMNS=password,hidden_column,another_column
```
PS: To create the `Google Service Account JSON Key File`, access [https://console.cloud.google.com/apis/credentials/serviceaccountkey](https://console.cloud.google.com/apis/credentials/serviceaccountkey)
Run:
```bash
vendor/bin/console sync table-name
```
If you want to auto create the table on BigQuery:
```bash
vendor/bin/console sync table-name --create-table
```
If you want to delete (and create) the table on BigQuery for a full dump:
```bash
vendor/bin/console sync table-name --delete-table
```
## Credits
:heart: Memed SA ([memed.com.br](https://memed.com.br))
## License
MIT license, see [LICENSE](LICENSE)