Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/practical-data-science/ecommercetools
EcommerceTools is a Python data science toolkit for ecommerce, marketing science, and technical SEO analysis and modelling and was created by Matt Clarke.
https://github.com/practical-data-science/ecommercetools
customer customers ecommerce marketing marketing-analytics marketing-tools retail seo seo-optimization seotools
Last synced: 3 months ago
JSON representation
EcommerceTools is a Python data science toolkit for ecommerce, marketing science, and technical SEO analysis and modelling and was created by Matt Clarke.
- Host: GitHub
- URL: https://github.com/practical-data-science/ecommercetools
- Owner: practical-data-science
- License: mit
- Created: 2021-03-06T17:03:05.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2024-01-29T09:34:58.000Z (9 months ago)
- Last Synced: 2024-07-21T04:43:03.391Z (4 months ago)
- Topics: customer, customers, ecommerce, marketing, marketing-analytics, marketing-tools, retail, seo, seo-optimization, seotools
- Language: Python
- Homepage:
- Size: 149 KB
- Stars: 231
- Watchers: 6
- Forks: 47
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
- awesome-marketing-machine-learning - ecommercetools - data-science/ecommercetools.svg?style=social) (Personalisation / Segmentation)
README
# EcommerceTools
![EcommerceTools](https://github.com/practical-data-science/ecommercetools/blob/master/banner.png?raw=true)
EcommerceTools is a data science toolkit for those working in technical ecommerce, marketing science, and technical seo and includes a wide range of features to aid analysis and model building. The package is written in Python and is designed to be used with Pandas and works within a Jupyter notebook environment or in standalone Python projects.
#### Installation
You can install EcommerceTools and its dependencies via PyPi by entering `pip3 install ecommercetools` in your terminal, or `!pip3 install ecommercetools` within a Jupyter notebook cell.
---
### Modules
- [Transactions](#Transactions)
- [Products](#Products)
- [Customers](#Customers)
- [Advertising](#Advertising)
- [Operations](#Operations)
- [Marketing](#Marketing)
- [NLP](#NLP)
- [SEO](#SEO)
- [Reports](#Reports)
---### Transactions
1. #### Load sample transaction items data
If you want to get started with the transactions, products, and customers features, you can use the `load_sample_data()` function to load a set of real world data. This imports the transaction items from widely-used Online Retail dataset and reformats it ready for use by EcommerceTools.
```python
from ecommercetools import utilitiestransaction_items = utilities.load_sample_data()
transaction_items.head()
```
order_id
sku
description
quantity
order_date
unit_price
customer_id
country
line_price
0
536365
85123A
WHITE HANGING HEART T-LIGHT HOLDER
6
2010-12-01 08:26:00
2.55
17850.0
United Kingdom
15.30
1
536365
71053
WHITE METAL LANTERN
6
2010-12-01 08:26:00
3.39
17850.0
United Kingdom
20.34
2
536365
84406B
CREAM CUPID HEARTS COAT HANGER
8
2010-12-01 08:26:00
2.75
17850.0
United Kingdom
22.00
3
536365
84029G
KNITTED UNION FLAG HOT WATER BOTTLE
6
2010-12-01 08:26:00
3.39
17850.0
United Kingdom
20.34
4
536365
84029E
RED WOOLLY HOTTIE WHITE HEART.
6
2010-12-01 08:26:00
3.39
17850.0
United Kingdom
20.34
2. #### Create a transaction items dataframe
The `utilities` module includes a range of tools that allow you to format data, so it can be used within other EcommerceTools functions. The `load_transaction_items()` function is used to create a Pandas dataframe of formatted transactional item data. When loading your transaction items data, all you need to do is define the column mappings, and the function will reformat the dataframe accordingly.
```python
import pandas as pd
from ecommercetools import utilitiestransaction_items = utilities.load_transaction_items('transaction_items_non_standard_names.csv',
date_column='InvoiceDate',
order_id_column='InvoiceNo',
customer_id_column='CustomerID',
sku_column='StockCode',
quantity_column='Quantity',
unit_price_column='UnitPrice'
)
transaction_items.to_csv('transaction_items.csv', index=False)
print(transaction_items.head())
```
order_id
sku
description
quantity
order_date
unit_price
customer_id
country
line_price
0
536365
85123A
WHITE HANGING HEART T-LIGHT HOLDER
6
2010-12-01 08:26:00
2.55
17850.0
United Kingdom
15.30
1
536365
71053
WHITE METAL LANTERN
6
2010-12-01 08:26:00
3.39
17850.0
United Kingdom
20.34
2
536365
84406B
CREAM CUPID HEARTS COAT HANGER
8
2010-12-01 08:26:00
2.75
17850.0
United Kingdom
22.00
3
536365
84029G
KNITTED UNION FLAG HOT WATER BOTTLE
6
2010-12-01 08:26:00
3.39
17850.0
United Kingdom
20.34
4
536365
84029E
RED WOOLLY HOTTIE WHITE HEART.
6
2010-12-01 08:26:00
3.39
17850.0
United Kingdom
20.34
3. #### Create a transactions dataframe
The `get_transactions()` function takes the formatted Pandas dataframe of transaction items and returns a Pandas dataframe of aggregated transaction data, which includes features identifying the order number.
```python
import pandas as pd
from ecommercetools import customerstransaction_items = pd.read_csv('transaction_items.csv')
transactions = transactions.get_transactions(transaction_items)
transactions.to_csv('transactions.csv', index=False)
print(transactions.head())
```
order_id
order_date
customer_id
skus
items
revenue
replacement
order_number
0
536365
2010-12-01 08:26:00
17850.0
7
40
139.12
0
1
1
536366
2010-12-01 08:28:00
17850.0
2
12
22.20
0
2
2
536367
2010-12-01 08:34:00
13047.0
12
83
278.73
0
1
3
536368
2010-12-01 08:34:00
13047.0
4
15
70.05
0
2
4
536369
2010-12-01 08:35:00
13047.0
1
3
17.85
0
3
---
### Products
#### 1. Get product data from transaction items
```python
products_df = products.get_products(transaction_items)
products_df.head()
```
sku
first_order_date
last_order_date
customers
orders
items
revenue
avg_unit_price
avg_quantity
avg_revenue
avg_orders
product_tenure
product_recency
0
10002
2010-12-01 08:45:00
2011-04-28 15:05:00
40
73
1037
759.89
1.056849
14.205479
10.409452
1.82
3749
3600
1
10080
2011-02-27 13:47:00
2011-11-21 17:04:00
19
24
495
119.09
0.376667
20.625000
4.962083
1.26
3660
3393
2
10120
2010-12-03 11:19:00
2011-12-04 13:15:00
25
29
193
40.53
0.210000
6.433333
1.351000
1.16
3746
3380
3
10123C
2010-12-03 11:19:00
2011-07-15 15:05:00
3
4
-13
3.25
0.487500
-3.250000
0.812500
1.33
3746
3522
4
10123G
2011-04-08 11:13:00
2011-04-08 11:13:00
0
1
-38
0.00
0.000000
-38.000000
0.000000
inf
3620
3620
#### 2. Calculate product consumption and repurchase rate
```python
repurchase_rates = products.get_repurchase_rates(transaction_items)
repurchase_rates.head(3).T
```
0
1
2
sku
10002
10080
10120
revenue
759.89
119.09
40.53
items
1037
495
193
orders
73
24
29
customers
40
19
25
avg_unit_price
1.05685
0.376667
0.21
avg_line_price
10.4095
4.96208
1.351
avg_items_per_order
14.2055
20.625
6.65517
avg_items_per_customer
25.925
26.0526
7.72
purchased_individually
0
0
9
purchased_once
34
17
22
bulk_purchases
73
24
20
bulk_purchase_rate
1
1
0.689655
repurchases
39
7
7
repurchase_rate
0.534247
0.291667
0.241379
repurchase_rate_label
Moderate repurchase
Low repurchase
Low repurchase
bulk_purchase_rate_label
Very high bulk
Very high bulk
High bulk
bulk_and_repurchase_label
Moderate repurchase_Very high bulk
Low repurchase_Very high bulk
Low repurchase_High bulk
---
### Customers
#### 1. Create a customers dataset
```python
from ecommercetools import customerscustomers_df = customers.get_customers(transaction_items)
customers_df.head()
```
customer_id
revenue
orders
skus
items
first_order_date
last_order_date
avg_items
avg_order_value
tenure
recency
cohort
0
12346.0
0.00
2
1
0
2011-01-18 10:01:00
2011-01-18 10:17:00
0.00
0.00
3701
3700
20111
1
12347.0
4310.00
7
7
2458
2010-12-07 14:57:00
2011-12-07 15:52:00
351.14
615.71
3742
3377
20104
2
12348.0
1797.24
4
4
2341
2010-12-16 19:09:00
2011-09-25 13:13:00
585.25
449.31
3733
3450
20104
3
12349.0
1757.55
1
1
631
2011-11-21 09:51:00
2011-11-21 09:51:00
631.00
1757.55
3394
3394
20114
4
12350.0
334.40
1
1
197
2011-02-02 16:01:00
2011-02-02 16:01:00
197.00
334.40
3685
3685
20111
#### 2. Create a customer cohort analysis dataset
```python
from ecommercetools import customerscohorts_df = customers.get_cohorts(transaction_items, period='M')
cohorts_df.head()
```
customer_id
order_id
order_date
acquisition_cohort
order_cohort
0
17850.0
536365
2010-12-01 08:26:00
2010-12
2010-12
7
17850.0
536366
2010-12-01 08:28:00
2010-12
2010-12
9
13047.0
536367
2010-12-01 08:34:00
2010-12
2010-12
21
13047.0
536368
2010-12-01 08:34:00
2010-12
2010-12
25
13047.0
536369
2010-12-01 08:35:00
2010-12
2010-12
#### 3. Create a customer cohort analysis matrix
```python
from ecommercetools import customerscohort_matrix_df = customers.get_cohort_matrix(transaction_items, period='M', percentage=True)
cohort_matrix_df.head()
```
periods
0
1
2
3
4
5
6
7
8
9
10
11
12
acquisition_cohort
2010-12
1.0
0.381857
0.334388
0.387131
0.359705
0.396624
0.379747
0.354430
0.354430
0.394515
0.373418
0.500000
0.274262
2011-01
1.0
0.239905
0.282660
0.242280
0.327791
0.299287
0.261283
0.256532
0.311164
0.346793
0.368171
0.149644
NaN
2011-02
1.0
0.247368
0.192105
0.278947
0.268421
0.247368
0.255263
0.281579
0.257895
0.313158
0.092105
NaN
NaN
2011-03
1.0
0.190909
0.254545
0.218182
0.231818
0.177273
0.263636
0.238636
0.288636
0.088636
NaN
NaN
NaN
2011-04
1.0
0.227425
0.220736
0.210702
0.207358
0.237458
0.230769
0.260870
0.083612
NaN
NaN
NaN
NaN
```python
from ecommercetools import customerscohort_matrix_df = customers.get_cohort_matrix(transaction_items, period='M', percentage=False)
cohort_matrix_df.head()
```
periods
0
1
2
3
4
5
6
7
8
9
10
11
12
acquisition_cohort
2010-12
948.0
362.0
317.0
367.0
341.0
376.0
360.0
336.0
336.0
374.0
354.0
474.0
260.0
2011-01
421.0
101.0
119.0
102.0
138.0
126.0
110.0
108.0
131.0
146.0
155.0
63.0
NaN
2011-02
380.0
94.0
73.0
106.0
102.0
94.0
97.0
107.0
98.0
119.0
35.0
NaN
NaN
2011-03
440.0
84.0
112.0
96.0
102.0
78.0
116.0
105.0
127.0
39.0
NaN
NaN
NaN
2011-04
299.0
68.0
66.0
63.0
62.0
71.0
69.0
78.0
25.0
NaN
NaN
NaN
NaN
#### 4. Create a customer "retention" dataset
```python
from ecommercetools import customersretention_df = customers.get_retention(transactions_df)
retention_df.head()
```
acquisition_cohort
order_cohort
customers
periods
0
2010-12
2010-12
948
0
1
2010-12
2011-01
362
1
2
2010-12
2011-02
317
2
3
2010-12
2011-03
367
3
4
2010-12
2011-04
341
4
#### 5. Create an RFM (H) dataset
This is an extension of the regular Recency, Frequency, Monetary value (RFM) model that includes an additional parameter "H" for heterogeneity. This shows the number of unique SKUs purchased by each customer. While typically unassociated with targeting, this value can be very useful in identifying which customers should probably be buying a broader mix of products than they currently are, as well as spotting those who may have stopped buying certain items.
```python
from ecommercetools import customersrfm_df = customers.get_rfm_segments(customers_df)
rfm_df.head()
```
customer_id
acquisition_date
recency_date
recency
frequency
monetary
heterogeneity
tenure
r
f
m
h
rfm
rfm_score
rfm_segment_name
0
12346.0
2011-01-18 10:01:00
2011-01-18 10:17:00
3700
2
0.00
1
3701
1
1
1
1
111
3
Risky
1
12350.0
2011-02-02 16:01:00
2011-02-02 16:01:00
3685
1
334.40
1
3685
1
1
1
1
111
3
Risky
2
12365.0
2011-02-21 13:51:00
2011-02-21 14:04:00
3666
3
320.69
2
3666
1
1
1
1
111
3
Risky
3
12373.0
2011-02-01 13:10:00
2011-02-01 13:10:00
3686
1
364.60
1
3686
1
1
1
1
111
3
Risky
4
12377.0
2010-12-20 09:37:00
2011-01-28 15:45:00
3690
2
1628.12
2
3730
1
1
1
1
111
3
Risky
#### 6. Create a purchase latency dataset
```python
from ecommercetools import customerslatency_df = customers.get_latency(transactions_df)
latency_df.head()
```
customer_id
frequency
recency_date
recency
avg_latency
min_latency
max_latency
std_latency
cv
days_to_next_order
label
0
12680.0
4
2011-12-09 12:50:00
3388
28
16
73
30.859898
1.102139
-3329.0
Order overdue
1
13113.0
24
2011-12-09 12:49:00
3388
15
0
52
12.060126
0.804008
-3361.0
Order overdue
2
15804.0
13
2011-12-09 12:31:00
3388
15
1
39
11.008261
0.733884
-3362.0
Order overdue
3
13777.0
33
2011-12-09 12:25:00
3388
11
0
48
12.055274
1.095934
-3365.0
Order overdue
4
17581.0
25
2011-12-09 12:21:00
3388
14
0
67
21.974293
1.569592
-3352.0
Order overdue
#### 7. Customer ABC segmentation
```python
from ecommercetools import customersabc_df = customers.get_abc_segments(customers_df, months=12, abc_class_name='abc_class_12m', abc_rank_name='abc_rank_12m')
abc_df.head()
```
customer_id
abc_class_12m
abc_rank_12m
0
12346.0
D
1.0
1
12347.0
D
1.0
2
12348.0
D
1.0
3
12349.0
D
1.0
4
12350.0
D
1.0
#### 8. Predict customer AOV, CLV, and orders
EcommerceTools allows you to predict the AOV, Customer Lifetime Value (CLV) and expected number of orders via the Gamma-Gamma and BG/NBD models from the excellent Lifetimes package. By passing the dataframe of transactions from `get_transactions()` to the `get_customer_predictions()` function, EcommerceTools will fit the BG/NBD and Gamma-Gamma models and predict the AOV, order quantity, and CLV for each customer in the defined number of future days after the end of the observation period.
```python
customer_predictions = customers.get_customer_predictions(transactions_df,
observation_period_end='2011-12-09',
days=90)
customer_predictions.head(10)
```
customer_id
predicted_purchases
aov
clv
0
12346.0
0.188830
NaN
NaN
1
12347.0
1.408736
569.978836
836.846896
2
12348.0
0.805907
333.784235
308.247354
3
12349.0
0.855607
NaN
NaN
4
12350.0
0.196304
NaN
NaN
5
12352.0
1.682277
376.175359
647.826169
6
12353.0
0.272541
NaN
NaN
7
12354.0
0.247183
NaN
NaN
8
12355.0
0.262909
NaN
NaN
9
12356.0
0.645368
324.039419
256.855226
---
### Advertising
#### 1. Create paid search keywords
```python
from ecommercetools import advertisingproduct_names = ['fly rods', 'fly reels']
keywords_prepend = ['buy', 'best', 'cheap', 'reduced']
keywords_append = ['for sale', 'price', 'promotion', 'promo', 'coupon', 'voucher', 'shop', 'suppliers']
campaign_name = 'fly_fishing'keywords = advertising.generate_ad_keywords(product_names, keywords_prepend, keywords_append, campaign_name)
keywords.head()
```
product
keywords
match_type
campaign_name
0
fly rods
[fly rods]
Exact
fly_fishing
1
fly rods
[buy fly rods]
Exact
fly_fishing
2
fly rods
[best fly rods]
Exact
fly_fishing
3
fly rods
[cheap fly rods]
Exact
fly_fishing
4
fly rods
[reduced fly rods]
Exact
fly_fishing
#### 2. Create paid search ad copy using Spintax
```python
from ecommercetools import advertisingtext = "Fly Reels from {Orvis|Loop|Sage|Airflo|Nautilus} for {trout|salmon|grayling|pike}"
spin = advertising.generate_spintax(text, single=False)spin
```['Fly Reels from Orvis for trout',
'Fly Reels from Orvis for salmon',
'Fly Reels from Orvis for grayling',
'Fly Reels from Orvis for pike',
'Fly Reels from Loop for trout',
'Fly Reels from Loop for salmon',
'Fly Reels from Loop for grayling',
'Fly Reels from Loop for pike',
'Fly Reels from Sage for trout',
'Fly Reels from Sage for salmon',
'Fly Reels from Sage for grayling',
'Fly Reels from Sage for pike',
'Fly Reels from Airflo for trout',
'Fly Reels from Airflo for salmon',
'Fly Reels from Airflo for grayling',
'Fly Reels from Airflo for pike',
'Fly Reels from Nautilus for trout',
'Fly Reels from Nautilus for salmon',
'Fly Reels from Nautilus for grayling',
'Fly Reels from Nautilus for pike']---
### Operations
#### 1. Create an ABC inventory classification
```python
inventory_classification = operations.get_inventory_classification(transaction_items)
inventory_classification.head()
```
sku
abc_class
abc_rank
0
10002
A
1
1
10080
A
2
2
10120
A
3
3
10123C
A
4
4
10123G
A
4
---
### Marketing#### 1. Get ecommerce trading calendar
```python
from ecommercetools import marketingtrading_calendar_df = marketing.get_trading_calendar('2021-01-01', days=365)
trading_calendar_df.head()
```
date
event
0
2021-01-01
January sale
1
2021-01-02
2
2021-01-03
3
2021-01-04
4
2021-01-05
#### 2. Get ecommerce trading events
```python
from ecommercetools import marketingtrading_events_df = marketing.get_trading_events('2021-01-01', days=365)
trading_events_df.head()
```
date
event
0
2021-01-01
January sale
1
2021-01-29
January Pay Day
2
2021-02-11
Valentine's Day [last order date]
3
2021-02-14
Valentine's Day
4
2021-02-26
February Pay Day
---
### NLP
#### 1. Generate text summaries
The `get_summaries()` function of the `nlp` module takes a Pandas dataframe containing text and returns a machine-generated summary of the content using a Huggingface Transformers pipeline via PyTorch. To use this feature, first load your Pandas dataframe and import the `nlp` module from `ecommercetools`.```python
import pandas as pd
from ecommercetools import nlppd.set_option('max_colwidth', 1000)
df = pd.read_csv('text.csv')
df.head()
```Specify the name of the Pandas dataframe, the column containing the text you wish to summarise (i.e. `product_description`), and specify a column name in which to store the machine-generated summary. The `min_length` and `max_length` arguments control the number of words generated, while the `do_sample` argument controls whether the generated text is completely unique (`do_sample=False`) or extracted from the text (`do_sample=True`).
```python
df = nlp.get_summaries(df, 'product_description', 'sampled_summary', min_length=50, max_length=100, do_sample=True)
df = nlp.get_summaries(df, 'product_description', 'unsampled_summary', min_length=50, max_length=100, do_sample=False)
df = nlp.get_summaries(df, 'product_description', 'unsampled_summary_20_to_30', min_length=20, max_length=30, do_sample=False)
```Since the model used for text summarisation is very large (1.2 GB plus), this function will take some time to complete. Once loaded, summaries are generated within a second or two per piece of text, so it is advisable to try smaller volumes of data initially.
### SEO
#### 1. Discover XML sitemap locations
The `get_sitemaps()` function takes the location of a `robots.txt` file (always stored at the root of a domain), and returns the URLs of any XML sitemaps listed within.```python
from ecommercetools import seositemaps = seo.get_sitemaps("http://www.flyandlure.org/robots.txt")
print(sitemaps)```
#### 2. Get an XML sitemap
The `get_dataframe()` function allows you to download the URLs in an XML sitemap to a Pandas dataframe. If the sitemap contains child sitemaps, each of these will be retrieved. You can save the Pandas dataframe to CSV in the usual way.```python
from ecommercetools import seodf = seo.get_sitemap("http://flyandlure.org/sitemap.xml")
print(df.head())
```
loc
changefreq
priority
domain
sitemap_name
0
http://flyandlure.org/
hourly
1.0
flyandlure.org
http://www.flyandlure.org/sitemap.xml
1
http://flyandlure.org/about
monthly
1.0
flyandlure.org
http://www.flyandlure.org/sitemap.xml
2
http://flyandlure.org/terms
monthly
1.0
flyandlure.org
http://www.flyandlure.org/sitemap.xml
3
http://flyandlure.org/privacy
monthly
1.0
flyandlure.org
http://www.flyandlure.org/sitemap.xml
4
http://flyandlure.org/copyright
monthly
1.0
flyandlure.org
http://www.flyandlure.org/sitemap.xml
#### 3. Get Core Web Vitals from PageSpeed Insights
The `get_core_web_vitals()` function retrieves the Core Web Vitals metrics for a list of sites from the Google PageSpeed Insights API and returns results in a Pandas dataframe. The function requires a a Google PageSpeed Insights API key.```python
from ecommercetools import seopagespeed_insights_key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
urls = ['https://www.bbc.co.uk', 'https://www.bbc.co.uk/iplayer']
df = seo.get_core_web_vitals(pagespeed_insights_key, urls)
print(df.head())
```#### 4. Get Google Knowledge Graph data
The `get_knowledge_graph()` function returns the Google Knowledge Graph data for a given search term. This requires the use of a Google Knowledge Graph API key. By default, the function returns output in a Pandas dataframe, but you can pass the `output="json"` argument if you wish to receive the JSON data back.```python
from ecommercetools import seoknowledge_graph_key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
knowledge_graph = seo.get_knowledge_graph(knowledge_graph_key, "tesla", output="dataframe")
print(knowledge_graph)
```#### 5. Get Google Search Console API data
The `query_google_search_console()` function runs a search query on the Google Search Console API and returns data in a Pandas dataframe. This function requires a JSON client secrets key with access to the Google Search Console API.```python
from ecommercetools import seokey = "google-search-console.json"
site_url = "http://flyandlure.org"
payload = {
'startDate': "2019-01-01",
'endDate': "2019-12-31",
'dimensions': ["page", "device", "query"],
'rowLimit': 100,
'startRow': 0
}df = seo.query_google_search_console(key, site_url, payload)
print(df.head())```
page
device
query
clicks
impressions
ctr
position
0
http://flyandlure.org/articles/fly_fishing_gea...
MOBILE
simms freestone waders review
56
217
25.81
3.12
1
http://flyandlure.org/
MOBILE
fly and lure
37
159
23.27
3.81
2
http://flyandlure.org/articles/fly_fishing_gea...
DESKTOP
orvis encounter waders review
35
134
26.12
4.04
3
http://flyandlure.org/articles/fly_fishing_gea...
DESKTOP
simms freestone waders review
35
200
17.50
3.50
4
http://flyandlure.org/
DESKTOP
fly and lure
32
170
18.82
3.09
##### Fetching all results from Google Search Console
To fetch all results, set `fetch_all` to `True`. This will automatically paginate through your Google Search Console data and return all results. Be aware that if you do this you may hit Google's quota limit if you run a query over an extended period, or have a busy site with lots of `page` or `query` dimensions.
```python
from ecommercetools import seokey = "google-search-console.json"
site_url = "http://flyandlure.org"
payload = {
'startDate': "2019-01-01",
'endDate': "2019-12-31",
'dimensions': ["page", "device", "query"],
'rowLimit': 25000,
'startRow': 0
}df = seo.query_google_search_console(key, site_url, payload, fetch_all=True)
print(df.head())```
##### Comparing two time periods in Google Search Console
```python
payload_before = {
'startDate': "2021-08-11",
'endDate': "2021-08-31",
'dimensions': ["page","query"],
}payload_after = {
'startDate': "2021-07-21",
'endDate': "2021-08-10",
'dimensions': ["page","query"],
}df = seo.query_google_search_console_compare(key, site_url, payload_before, payload_after, fetch_all=False)
df.sort_values(by='clicks_change', ascending=False).head()
```#### 6. Get the number of "indexed" pages
The `get_indexed_pages()` function uses the "site:" prefix to search Google for the number of pages "indexed". This is very approximate and may not be a perfect representation, but it's usually a good guide of site "size" in the absence of other data.```python
from ecommercetools import seourls = ['https://www.bbc.co.uk', 'https://www.bbc.co.uk/iplayer', 'http://flyandlure.org']
df = seo.get_indexed_pages(urls)
print(df.head())
```
url
indexed_pages
2
http://flyandlure.org
2090
1
https://www.bbc.co.uk/iplayer
215000
0
https://www.bbc.co.uk
12700000
#### 7. Get keyword suggestions from Google Autocomplete
The `google_autocomplete()` function returns a set of keyword suggestions from Google Autocomplete. The `include_expanded=True` argument allows you to expand the number of suggestions shown by appending prefixes and suffixes to the search terms.```python
from ecommercetools import seosuggestions = seo.google_autocomplete("data science", include_expanded=False)
print(suggestions)suggestions = seo.google_autocomplete("data science", include_expanded=True)
print(suggestions)
```
term
relevance
0
data science jobs
650
1
data science jobs chester
601
2
data science course
600
3
data science masters
554
4
data science salary
553
5
data science internship
552
6
data science jobs london
551
7
data science graduate scheme
550
#### 8. Retrieve robots.txt content
The `get_robots()` function returns the contents of a robots.txt file in a Pandas dataframe so it can be parsed and analysed.```python
from ecommercetools import seorobots = seo.get_robots("http://www.flyandlure.org/robots.txt")
print(robots)
```
directive
parameter
0
User-agent
*
1
Disallow
/signin
2
Disallow
/signup
3
Disallow
/users
4
Disallow
/contact
5
Disallow
/activate
6
Disallow
/*/page
7
Disallow
/articles/search
8
Disallow
/search.php
9
Disallow
*q=*
10
Disallow
*category_slug=*
11
Disallow
*country_slug=*
12
Disallow
*county_slug=*
13
Disallow
*features=*
#### 9. Get Google SERPs
The `get_serps()` function returns a Pandas dataframe containing the Google search engine results for a given search term. Note that this function is not suitable for large-scale scraping and currently includes no features to prevent it from being blocked.```python
from ecommercetools import seoserps = seo.get_serps("data science blog")
print(serps)
```
title
link
text
0
10 of the best data science blogs to follow - ...
https://www.tableau.com/learn/articles/data-sc...
10 of the best data science blogs to follow. T...
1
Best Data Science Blogs to Follow in 2020 | by...
https://towardsdatascience.com/best-data-scien...
14 Jul 2020 — 1. Towards Data Science · Joined...
2
Top 20 Data Science Blogs And Websites For Dat...
https://medium.com/@exastax/top-20-data-scienc...
Top 20 Data Science Blogs And Websites For Dat...
3
Data Science Blog – Dataquest
https://www.dataquest.io/blog/
Browse our data science blog to get helpful ti...
4
51 Awesome Data Science Blogs You Need To Chec...
https://365datascience.com/trending/51-data-sc...
Blog name: DataKind · datakind data science bl...
5
Blogs on AI, Analytics, Data Science, Machine ...
https://www.kdnuggets.com/websites/blogs.html
Individual/small group blogs · Ai4 blog, featu...
6
Data Science Blog – Applied Data Science
https://data-science-blog.com/
... an Bedeutung – DevOps for Data Science. De...
7
Top 10 Data Science and AI Blogs in 2020 - Liv...
https://livecodestream.dev/post/top-data-scien...
Some of the best data science and AI blogs for...
8
Data Science Blogs: 17 Must-Read Blogs for Dat...
https://www.thinkful.com/blog/data-science-blogs/
Data scientists could be considered the magici...
9
rushter/data-science-blogs: A curated list of ...
https://github.com/rushter/data-science-blogs
A curated list of data science blogs. Contribu...
#### Create an ABCD classification of Google Search Console data
The `classify_pages()` function returns an ABCD classification of Google Search Console data. This calculates the cumulative sum of clicks and then categorises pages using the ABC algorithm (the first 80% are classed A, the next 10% are classed B, and the final 10% are classed C, with the zero click pages classed D).```python
from ecommercetools import seokey = "client_secrets.json"
site_url = "example-domain.co.uk"
start_date = '2022-10-01'
end_date = '2022-10-31'df_classes = seo.classify_pages(key, site_url, start_date, end_date, output='classes')
print(df_classes.head())df_summary = seo.classify_pages(key, site_url, start_date, end_date, output='summary')
print(df_summary)```
page clicks impressions ctr position clicks_cumsum clicks_running_pc pc_share class class_rank
0 https://practicaldatascience.co.uk/machine-lea... 3890 36577 10.64 12.64 3890 8.382898 8.382898 A 1
1 https://practicaldatascience.co.uk/data-scienc... 2414 16618 14.53 14.30 6304 13.585036 5.202138 A 2
2 https://practicaldatascience.co.uk/data-scienc... 2378 71496 3.33 16.39 8682 18.709594 5.124558 A 3
3 https://practicaldatascience.co.uk/data-scienc... 1942 14274 13.61 15.02 10624 22.894578 4.184984 A 4
4 https://practicaldatascience.co.uk/data-scienc... 1738 23979 7.25 11.80 12362 26.639945 3.745367 A 5
class pages impressions clicks avg_ctr avg_position share_of_clicks share_of_impressions
0 A 63 747643 36980 5.126349 22.706825 79.7 43.7
1 B 46 639329 4726 3.228043 31.897826 10.2 37.4
2 C 190 323385 4698 2.393632 38.259368 10.1 18.9
3 D 36 1327 0 0.000000 25.804722 0.0 0.1---
### Reports
The Reports module creates weekly, monthly, quarterly, or yearly reports for customers and orders and calculates a range of common ecommerce metrics to show business performance.#### 1. Customers report
The `customers_report()` function takes a formatted dataframe of transaction items (see above) and a desired frequency (D for daily, W for weekly, M for monthly, Q for quarterly) and calculates aggregate metrics for each period.The function returns the number of orders, the number of customers, the number of new customers, the number of returning customers, and the acquisition rate (or proportion of new customers). For monthly reporting, I would recommend a 13-month period so you can compare the last month with the same month the previous year.
```python
from ecommercetools import reportsdf_customers_report = reports.customers_report(transaction_items, frequency='M')
print(df_customers_report.head(13))
```#### 2. Transactions report
The `transactions_report()` function takes a formatted dataframe of transaction items (see above) and a desired frequency (D for daily, W for weekly, M for monthly, Q for quarterly) and calculates aggregate metrics for each period.The metrics returned are: customers, orders, revenue, SKUs, units, average order value, average SKUs per order, average units per order, and average revenue per customer.
```python
from ecommercetools import reportsdf_orders_report = reports.transactions_report(transaction_items, frequency='M')
print(df_orders_report.head(13))
```