https://github.com/larisanti/transaction-ml
This project demonstrates a sequence of BigQuery ML queries to build and evaluate a logistic regression model that predicts customer transactions based on website traffic data from Google Analytics.
https://github.com/larisanti/transaction-ml
bigquery machine-learning
Last synced: about 1 year ago
JSON representation
This project demonstrates a sequence of BigQuery ML queries to build and evaluate a logistic regression model that predicts customer transactions based on website traffic data from Google Analytics.
- Host: GitHub
- URL: https://github.com/larisanti/transaction-ml
- Owner: larisanti
- Created: 2025-05-07T14:53:31.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-05-07T19:46:23.000Z (about 1 year ago)
- Last Synced: 2025-05-11T21:55:27.760Z (about 1 year ago)
- Topics: bigquery, machine-learning
- Homepage:
- Size: 245 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# BigQuery ML Transaction Forecasting Lab
This project is a lab exercise completed during the Machine Learning Engineer Learning Path course. It demonstrates a sequence of BigQuery ML queries to build and evaluate a logistic regression model that predicts customer transactions based on website traffic data from Google Analytics.
The project utilizes the `google_analytics_sample` dataset to train and evaluate the model. The model uses features such as operating system, mobile device usage, country, and pageviews to predict whether a visitor will make a transaction.
## Workflow
First, a BigQuery dataset is created, then:
## 1. **Create a BigQuery ML model:**
```sql
CREATE OR REPLACE MODEL `bqml_lab.sample_model`
OPTIONS(model_type='logistic_reg') AS
SELECT
IF(totals.transactions IS NULL, 0, 1) AS label,
IFNULL(device.operatingSystem, "") AS os,
device.isMobile AS is_mobile,
IFNULL(geoNetwork.country, "") AS country,
IFNULL(totals.pageviews, 0) AS pageviews
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*` -- dataset: Google Analytics sample data
WHERE
_TABLE_SUFFIX BETWEEN '20160801' AND '20170631'
LIMIT 100000; -- limit to 100,000 rows to speed up training
```


---
## 2. **Evaluate the model:**
```sql
SELECT
*
FROM
ml.EVALUATE(MODEL `bqml_lab.sample_model`, (
SELECT
IF(totals.transactions IS NULL, 0, 1) AS label, -- features used for prediction
IFNULL(device.operatingSystem, "") AS os,
device.isMobile AS is_mobile,
IFNULL(geoNetwork.country, "") AS country,
IFNULL(totals.pageviews, 0) AS pageviews
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20170701' AND '20170801'));
```

---
## 3. **Predict purchases per country:**
```sql
SELECT
country,
SUM(predicted_label) as total_predicted_purchases -- total predicted purchases for the country
FROM
ml.PREDICT(MODEL `bqml_lab.sample_model`, (
SELECT
IFNULL(device.operatingSystem, "") AS os, -- features for prediction
device.isMobile AS is_mobile,
IFNULL(totals.pageviews, 0) AS pageviews,
IFNULL(geoNetwork.country, "") AS country
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20170701' AND '20170801'))
GROUP BY country
ORDER BY total_predicted_purchases DESC
LIMIT 10;
```

---
## 4. **Predict purchases per user:**
```sql
SELECT
fullVisitorId,
SUM(predicted_label) as total_predicted_purchases -- total predicted purchases for each user
FROM
ml.PREDICT(MODEL `bqml_lab.sample_model`, ( -- apply the trained model for prediction
SELECT
IFNULL(device.operatingSystem, "") AS os,
device.isMobile AS is_mobile,
IFNULL(totals.pageviews, 0) AS pageviews,
IFNULL(geoNetwork.country, "") AS country,
fullVisitorId
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20170701' AND '20170801'))
GROUP BY fullVisitorId
ORDER BY total_predicted_purchases DESC
LIMIT 10;
```

## Prerequisites
* A Google Cloud Project
* Access to BigQuery
## Dataset
This project utilizes the following public dataset:
* `bigquery-public-data.google_analytics_sample`