https://github.com/oxylabs/automate-competitors-benchmark-analysis

A tutorial for automating competitors’ & benchmark analysis using Python
https://github.com/oxylabs/automate-competitors-benchmark-analysis

analysis automation github-python python web-scraping

Last synced: 2 months ago
JSON representation

A tutorial for automating competitors’ & benchmark analysis using Python

Host: GitHub
URL: https://github.com/oxylabs/automate-competitors-benchmark-analysis
Owner: oxylabs
Created: 2022-02-28T11:48:28.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2025-02-11T12:44:44.000Z (9 months ago)
Last Synced: 2025-03-29T22:21:58.237Z (8 months ago)
Topics: analysis, automation, github-python, python, web-scraping
Language: Python
Homepage:
Size: 16.6 KB
Stars: 4
Watchers: 0
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # How to Automate Competitors’ & Benchmark Analysis With Python

[![Oxylabs promo code](https://raw.githubusercontent.com/oxylabs/product-integrations/refs/heads/master/Affiliate-Universal-1090x275.png)](https://oxylabs.io/pages/gitoxy?utm_source=877&utm_medium=affiliate&groupid=877&utm_content=automate-competitors-benchmark-analysis-github&transaction_id=102f49063ab94276ae8f116d224b67)

[![](https://dcbadge.vercel.app/api/server/eWsVUJrnG5)](https://discord.gg/GbxmdGhZjq)

- [Using Oxylabs’ solution to retrieve the SERPs results](#using-oxylabs-solution-to-retrieve-the-serps-results)

- [Scraping URLs of the top results](#scraping-urls-of-the-top-results)

- [Obtaining the off-page metrics](#obtaining-the-off-page-metrics)

- [Obtaining the Page Speed metrics](#obtaining-the-page-speed-metrics)

- [Converting Python list into a dataframe and exporting it as an Excel file](#converting-python-list-into-a-dataframe-and-exporting-it-as-an-excel-file)

Doing competitors’ or benchmark analysis for SEO can be a burdensome task as it requires taking into account many factors which usually are extracted from different data sources. 

The purpose of this article is to help you automate the data extraction processes as much as possible. After learning how to do this, you can dedicate your time to what matters: the analysis itself and coming up with actionable insights to strategize.

For a detailed explanation, see our [blog post](https://oxy.yt/erEh).

## Using Oxylabs’ solution to retrieve the SERPs results

```python

import requests

keyword = ""

payload = {

    "source": "SEARCH_ENGINE_search",

    "domain": "com",

    "query": keyword,

    "parse": "true",

}

response = requests.request(

    "POST",

    "https://realtime.oxylabs.io/v1/queries",

    auth=("", ""),

    json=payload,

)

list_comparison = [

    [x["url"], x["title"]]

    for x in response.json()["results"][0]["content"]["results"]["organic"]

]

```

Viewing the results:

```python

>>> print(list_comparison)

[

    ["https://example.com/result/example-link", "Example Link - Example"],

    ["https://more-examples.net", "Homepage - More Examples"],

    ["https://you-searched-for.com/query=your_keyword", "You Searched for 'your_keyword'. Analyze your search now!"],

]

```

## Scraping URLs of the top results

```python

import requests

from bs4 import BeautifulSoup

for y in list_comparison:

    try:

        print("Scraping: " + y[0])

        html = requests.request("get", y[0])

        soup = BeautifulSoup(html.text)

        try:

            metatitle = (soup.find("title")).get_text()

        except Exception:

            metatitle = ""

        try:

            metadescription = soup.find("meta", attrs={"name": "description"})["content"]

        except Exception:

             metadescription = ""

        try:

            h1 = soup.find("h1").get_text()

        except Exception:

            h1 = ""

        paragraph = [a.get_text() for a in soup.find_all('p')]

        text_length = sum(len(a) for a in paragraph)

        text_counter = sum(a.lower().count(keyword) for a in paragraph)

        metatitle_occurrence = keyword in metatitle.lower()

        h1_occurrence = keyword in h1.lower()

        metatitle_equal = metatitle == y[1]        

        y.extend([metatitle, metatitle_equal, metadescription, h1, paragraph, text_length, text_counter, metatitle_occurrence, h1_occurrence])

    except Exception as e:

        print(e)

        y.extend(["No data"]*9)

```

## Obtaining the off-page metrics

```python

import time

from mozscape import Mozscape

client = Mozscape("", "")

for y in list_comparison:

    try:

        print("Getting MOZ results for: " + y[0])

        domainAuthority = client.urlMetrics(y[0])

        y.extend([domainAuthority["ueid"], domainAuthority["uid"], domainAuthority["pda"]])

    except Exception as e:

        print(e)

        time.sleep(10)  # Retry once after 10 seconds.

        domainAuthority = client.urlMetrics(y[0])

        y.extend([domainAuthority["ueid"], domainAuthority["uid"], domainAuthority["pda"]])

```

## Obtaining the Page Speed metrics

```python

import json

pagespeed_key = ""

for y in list_comparison:

    try:

        print("Getting results for: " + y[0])

        url = "https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url=" + y[0] + "&strategy=mobile&locale=en&key=" + pagespeed_key

        response = requests.request("GET", url)

        data = response.json() 

        overall_score = data["lighthouseResult"]["categories"]["performance"]["score"] * 100

        fcp = data["loadingExperience"]["metrics"]["FIRST_CONTENTFUL_PAINT_MS"]["percentile"]/1000

        fid = data["loadingExperience"]["metrics"]["FIRST_INPUT_DELAY_MS"]["percentile"]/1000

        lcp = data["loadingExperience"]["metrics"]["LARGEST_CONTENTFUL_PAINT_MS"]["percentile"]

        cls = data["loadingExperience"]["metrics"]["CUMULATIVE_LAYOUT_SHIFT_SCORE"]["percentile"]/100

        y.extend([fcp, fid, lcp, cls, overall_score])

    except Exception as e:

        print(e)

        y.extend(["No data", "No data", "No data", "No data", overall_score])

```

## Converting Python list into a dataframe and exporting it as an Excel file

```python

import pandas as pd

df = pd.DataFrame(list_comparison)

df.columns = ["URL","Metatitle SERPs", "Metatitle Onpage","Metatitle Equal", "Metadescription", "H1", "Paragraphs", "Text Length", "Keyword Occurrences Paragraph", "Metatitle Occurrence", "Metadescription Occurrence", "Equity Backlinks MOZ", "Total Backlinks MOZ", "Domain Authority", "FCP", "FID","LCP","CLS","Overall Score"]

df.to_excel('.xlsx', header=True, index=False)

```

If you wish to find out more, see our [blog post](https://oxy.yt/erEh).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/oxylabs/automate-competitors-benchmark-analysis

Awesome Lists containing this project

README