https://github.com/intake/intake-google-analytics
Intake driver for Google Analytics queries
https://github.com/intake/intake-google-analytics
Last synced: 9 months ago
JSON representation
Intake driver for Google Analytics queries
- Host: GitHub
- URL: https://github.com/intake/intake-google-analytics
- Owner: intake
- License: other
- Created: 2021-09-20T17:41:52.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2021-10-28T16:15:51.000Z (over 4 years ago)
- Last Synced: 2024-05-08T20:04:07.170Z (about 2 years ago)
- Language: Python
- Size: 55.7 KB
- Stars: 2
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Intake Google Analytics driver
[Intake](intake.readthedocs.io) driver for [Google Analytics Queries](https://developers.google.com/analytics/devguides/reporting/core/v4/quickstart/service-py)
## Install
```
conda install -c defusco -c conda-forge intake-google-analytics
```
## Sample query
All queries require
* the `view_id` for your Google Analytics account
* a start date and end date
* at least one [metric](https://ga-dev-tools.web.app/dimensions-metrics-explorer/) to compute
* Service account credentials JSON file
The easiest way to get the `view_id` is to use the
[Analytics Explorer](https://ga-dev-tools.web.app/account-explorer/).
To authenticate to the Core Reporting API v4 you will need a [service account]() and JSON credentials file.
Follow the steps [here](https://developers.google.com/analytics/devguides/reporting/core/v4/quickstart/service-py#1_enable_the_api) to prepare your account and download the credentials. Typically, this
file is called `client_secrets.json`.
To compute the total number of visitors between 5 days ago and yesterday the following query
is constructed.
```python
import intake
ds = intake.open_google_analytics_query(
view_id='',
start_date='5DaysAgo',
end_date='yesterday',
metrics=['ga:users'],
credentials_path='client_secrets.json'
)
df = ds.read()
print(df)
```
The output is
```
ga:users
0 301012
```
## Date ranges
This driver accepts a number of formats for both `start_date` and `end_date`
* Specific string values: `'yesterday'`, `'today'`, `'DaysAgo'`
* Datetime-parsable strings in format `'YYYY-MM-DD'`
* `pd.TimeStamp`, `datetime.date`, or `datetime.datetime` objects
These data types can be mixed. Here's a few examples
GA strings only:
```python
ds = intake.open_google_analytics_query(
view_id='',
start_date='30DaysAgo',
end_date='yesterday',
metrics=['ga:users'],
credentials_path='client_secrets.json'
)
```
Datetime objects and GA strings:
```python
import pandas as pd
ds = intake.open_google_analytics_query(
view_id='',
start_date=pd.to_datetime('March 19, 2020'),
end_date='30DaysAgo',
metrics=['ga:users'],
credentials_path='client_secrets.json'
)
```
Datetime objects with differencing:
```python
import pandas as pd
import datetime as dt
ds = intake.open_google_analytics_query(
view_id='',
start_date=dt.datetime.now() - pd.DateOffset(months=6),
end_date=pd.to_datetime('today') - dt.timedelta(days=1),
metrics=['ga:users'],
credentials_path='client_secrets.json'
)
```
## Metrics, Dimensions, and Filters
The easiest way to learn how to build queries and gather the required information is to use the
[Query Explorer](https://ga-dev-tools.web.app/query-explorer/). Refer to the
[UA Dimensions and Metrics explorer](https://ga-dev-tools.web.app/dimensions-metrics-explorer/)
for documentation on all available fields.
### Metric expressions
Metrics are specified as a list of strings or dictionaries and required for all queries.
The output DataFrame will have columns that match the metric expressions.
When using the dictionary format the `'expression'` key is required and the `'alias'` key is
optional to change the column name in the DataFrame.
GA [metric expressions](https://developers.google.com/analytics/devguides/reporting/core/v4/basics#expressions)
are supported for both string and dictionary inputs. Here's an example
of the supported types of `metrics`.
```python
ds = intake.open_google_analytics_query(
view_id='',
start_date='5DaysAgo',
end_date='yesterday',
metrics=[
'ga:users', # simple string value
'ga:goal1completions/ga:users', # metric expression as string
{'expression': 'ga:sessions'}, # the expression key is required for dicts
{
'expression': 'ga:sessionDuration/ga:sessions', # expression
'alias': 'time-per-session' # rename the column
}
],
credentials_path='client_secrets.json'
)
```
### Dimensions
In addition to metrics an optional list of fields can be provided as dimensions.
See the [UA Dimensions and Metrics Explorer](https://ga-dev-tools.web.app/dimensions-metrics-explorer/)
for the full list of available dimensions. Dimensions are provided as a list of strings.
Unlike metrics dimensions do not support aliasing.
```python
ds = intake.open_google_analytics_query(
view_id='',
start_date='5DaysAgo',
end_date='yesterday',
metrics=[
'ga:users',
{
'expression': 'ga:sessionDuration/ga:sessions',
'alias': 'time-per-session'
}
],
dimensions = ['ga:userType', 'ga:browser']
credentials_path='client_secrets.json'
)
```
### Filtering
Filters can be applied on metrics or dimensions. See the
[filters documentation](https://developers.google.com/analytics/devguides/reporting/core/v3/reference#filters)
for more details.