https://github.com/luminati-io/google-search-api
Two methods to collect real Google SERP data—a free scraper for basic use and the enterprise-grade Bright Data API for high-volume demands.
https://github.com/luminati-io/google-search-api
data google-scraper html python serp-api web-scraping
Last synced: 11 months ago
JSON representation
Two methods to collect real Google SERP data—a free scraper for basic use and the enterprise-grade Bright Data API for high-volume demands.
- Host: GitHub
- URL: https://github.com/luminati-io/google-search-api
- Owner: luminati-io
- Created: 2025-02-26T07:25:30.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-02-26T07:45:43.000Z (over 1 year ago)
- Last Synced: 2025-03-22T07:02:01.523Z (about 1 year ago)
- Topics: data, google-scraper, html, python, serp-api, web-scraping
- Language: HTML
- Homepage: https://brightdata.com/products/serp-api/google-search
- Size: 6.26 MB
- Stars: 0
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Google Search API
[](https://brightdata.com/products/serp-api/google-search)
> ⚠️ As of January 2025, [Google requires JavaScript](https://techcrunch.com/2025/01/17/google-begins-requiring-javascript-for-google-search/) to render search results. This update aims to block traditional bots, scrapers, and SEO tools that rely on non-JavaScript-based methods. As a result, businesses using Google Search for market research or ranking analysis must adopt tools that support JavaScript rendering.
This repository provides two approaches for collecting Google SERP data:
1. A free, small-scale scraper suitable for basic data collection
2. An enterprise-grade API solution built for high-volume and robust data needs
## Table of Contents
- [Free Scraper](#free-scraper)
- [Input Parameters](#input-parameters)
- [Implementation](#implementation)
- [Sample Output](#sample-output)
- [Limitations](#limitations)
- [Bright Data Google Search API](#bright-data-google-search-api)
- [Key Features](#key-features)
- [Getting Started](#getting-started)
- [Direct API Access](#direct-api-access)
- [Native Proxy-Based Access](#native-proxy-based-access)
- [Advanced Features](#advanced-features)
- [Localization](#localization)
- [Search Type](#search-type)
- [Pagination](#pagination)
- [Geo-Location](#geo-location)
- [Device Type](#device-type)
- [Browser Type](#browser-type)
- [Parsing Results](#parsing-results)
- [Hotel Search](#hotel-search)
- [Parallel Searches](#parallel-searches)
- [AI Overview](#ai-overview)
- [Support & Resources](#support--resources)
## Free Scraper
A lightweight Google scraper for basic data collection needs.

### Input Parameters
- **File:** List of search terms to query in Google (required)
- **Pages:** Number of Google pages to scrape data from
### Implementation
Modify these parameters in the [Python file](https://github.com/luminati-io/Google-Search-API/blob/main/free_google_scraper/google_serp.py):
```python
HEADLESS = False
MAX_RETRIES = 2
REQUEST_DELAY = (1, 4)
SEARCH_TERMS = [
"nike shoes",
"macbook pro"
]
PAGES_PER_TERM = 3
```
💡 **Tip:** Set `HEADLESS = False` to help avoid Google's detection mechanisms.
### Sample Output

### Limitations
Google implements several anti-scraping measures:
1. **CAPTCHAs:** Used to differentiate between humans and bots
2. **IP Blocks:** Temporary or permanent bans for suspicious activity
3. **Rate Limiting:** Rapid requests may trigger blocks
4. **Geotargeting:** Results vary by location, language, and device
5. **Honeypot Traps:** Hidden elements to detect automated access
After multiple requests, you'll likely encounter Google's CAPTCHA challenge:

## Bright Data Google Search API
[Bright Data's Google Search API](https://brightdata.com/products/serp-api/google-search) provides real-user search results from Google using customizable search parameters. Built on the same advanced technology as the [SERP API](https://brightdata.com/products/serp-api), it delivers high success rates and robust performance for scraping publicly available data at scale.
### Key Features
- High Success Rates, even with large volumes
- Pay only for successful requests
- Fast response time - under 5 seconds
- Geolocation targeting – Extract data from any country, city, or device
- Output formats – Retrieve data in JSON or raw HTML
- Multiple search types – News, images, shopping, jobs, etc
- Asynchronous requests – Fetch results in batches
- Built for scale – Handles high traffic and peak loads
📌 Test it for free in our [SERP Playground](https://brightdata.com/products/serp-api/google-search):

### Getting Started
1. **Prerequisites:**
- Create a [Bright Data account](https://brightdata.com/) (new users receive a $5 credit)
- Obtain your [API key](https://docs.brightdata.com/general/account/api-token)
2. **Setup:** Follow the [step-by-step guide](https://github.com/luminati-io/Google-Search-API/blob/main/setup_serp_api.md) to integrate the SERP API into your Bright Data account
3. **Implementation Methods:**
- Direct API Access
- Native Proxy-Based Access
### Direct API Access
The simplest method is to make a direct request to the API.
**cURL Example**
```bash
curl https://api.brightdata.com/request \
-H "Content-Type: application/json" \
-H "Authorization: Bearer API_TOKEN" \
-d '{
"zone": "ZONE_NAME",
"url": "https://www.google.com/search?q=ollama&brd_json=1",
"format": "raw"
}'
```
**Python Example**
```python
import requests
import json
url = "https://api.brightdata.com/request"
headers = {"Content-Type": "application/json", "Authorization": "Bearer API_TOKEN"}
payload = {
"zone": "ZONE_NAME",
"url": "https://www.google.com/search?q=ollama&brd_json=1",
"format": "raw",
}
response = requests.post(url, headers=headers, json=payload)
with open("serp_direct_api.json", "w") as file:
json.dump(response.json(), file, indent=4)
print("Response saved to 'serp_direct_api.json'.")
```
👉 View [full JSON output](https://github.com/luminati-io/Google-Search-API/blob/main/google_search_api_outputs/serp_direct_api.json)
> **Note**: Use `brd_json=1` for parsed JSON or `brd_json=html` for parsed JSON + full nested HTML.
Learn more about parsing search results in our [SERP API Parsing Guide](https://docs.brightdata.com/scraping-automation/serp-api/parsing-search-results).
### Native Proxy-Based Access
Alternatively, you can use our proxy routing method.
**cURL Example**
```bash
curl -i \
--proxy brd.superproxy.io:33335 \
--proxy-user "brd-customer--zone-:" \
-k \
"https://www.google.com/search?q=ollama"
```
**Python Example**
```python
import requests
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
host = "brd.superproxy.io"
port = 33335
username = "brd-customer--zone-"
password = ""
proxy_url = f"http://{username}:{password}@{host}:{port}"
proxies = {"http": proxy_url, "https": proxy_url}
url = "https://www.google.com/search?q=ollama"
response = requests.get(url, proxies=proxies, verify=False)
with open("serp_native_proxy.html", "w", encoding="utf-8") as file:
file.write(response.text)
print("Response saved to 'serp_native_proxy.html'.")
```
👉 View [full HTML output](https://github.com/luminati-io/Google-Search-API/blob/main/google_search_api_outputs/serp_native_proxy.html)
For production, load Bright Data’s SSL certificate (see our [SSL Certificate Guide](https://docs.brightdata.com/general/account/ssl-certificate)).
## Advanced Features
### Localization

1. `gl` (Country Code)
- Two-letter country code that determines the country for search results
- Simulates a search as if made from a specific country
Example: Search for restaurants in France
```bash
curl --proxy brd.superproxy.io:33335 \
--proxy-user "brd-customer--zone-:" \
"https://www.google.com/search?q=best+restaurants+in+paris&gl=fr"
```
2. `hl` (Language Code)
- Two-letter language code that sets the language of page content
- Affects the interface and search results language
Example: Search for sushi restaurants in Japan (results in Japanese)
```bash
curl --proxy brd.superproxy.io:33335 \
--proxy-user "brd-customer--zone-:" \
"https://www.google.com/search?q=best+sushi+restaurants+in+tokyo&hl=ja"
```
You can use both parameters together for better localization:
```bash
curl --proxy brd.superproxy.io:33335 \
--proxy-user "brd-customer--zone-:" \
"https://www.google.com/search?q=best+hotels+in+berlin&gl=de&hl=de"
```
### Search Type

1. `tbm` (Search Category)
- Specifies a particular search type (images, news, etc.)
- **Options**:
- `tbm=isch` → **Images**
- `tbm=shop` → **Shopping**
- `tbm=nws` → **News**
- `tbm=vid` → **Videos**
**Example** (Shopping search):
```bash
curl --proxy brd.superproxy.io:33335 \
--proxy-user "brd-customer--zone-:" \
"https://www.google.com/search?q=macbook+pro&tbm=shop"
```
2. `ibp` (Jobs Search Parameter)
- Use specifically for jobs-related searches
- Example: `ibp=htl;jobs` returns job listings
**Example**:
```bash
curl --proxy brd.superproxy.io:33335 \
--proxy-user "brd-customer--zone-:" \
"https://www.google.com/search?q=technical+copywriter&ibp=htl;jobs"
```
### Pagination
Navigate through pages of results or adjust the number of displayed results:
1. `start`
- Defines the starting point for search results
- Examples:
- `start=0` (default) - First page
- `start=10` - Second page (results 11-20)
- `start=20` - Third page (results 21-30)
**Example** (Start from the 11th result):
```bash
curl --proxy brd.superproxy.io:33335 \
--proxy-user "brd-customer--zone-:" \
"https://www.google.com/search?q=best+coding+laptops+2025&start=10"
```
2. `num`
- Defines how many results to return per page
- Examples:
- `num=10` (default) - Returns 10 results
- `num=50` - Returns 50 results
**Example** (Return 40 results):
```bash
curl --proxy brd.superproxy.io:33335 \
--proxy-user "brd-customer--zone-:" \
"https://www.google.com/search?q=best+coding+laptops+2025&num=40"
```
### Geo-Location

The `uule` parameter customizes search results based on a specific location:
- It requires an encoded string, not plain text.
- Locate the raw location string in the Canonical Name column of [Google's geotargeting CSV](https://developers.google.com/adwords/api/docs/appendix/geotargeting).
- Convert the raw string into the encoded format using a third-party converter or a built-in library.
- Include the encoded string in your API request as the value for `uule`.
```bash
curl --proxy brd.superproxy.io:33335 \
--proxy-user "brd-customer--zone-:" \
"https://www.google.com/search?q=best+hotels+in+paris&uule=w+CAIQICIGUGFyaXM"
```
### Device Type

Use the `brd_mobile` parameter to simulate requests from specific devices:
| Value | Device | User-Agent Type |
| --- | --- | --- |
| `0` or omit | Desktop | Desktop |
| `1` | Mobile | Mobile |
| `ios` or `iphone` | iPhone | iOS |
| `ipad` or `ios_tablet` | iPad | iOS Tablet |
| `android` | Android | Android |
| `android_tablet` | Android Tablet | Android Tablet |
**Example: Mobile Search**
```bash
curl --proxy brd.superproxy.io:33335 \
--proxy-user "brd-customer--zone-:" \
"https://www.google.com/search?q=best+laptops&brd_mobile=1"
```
### Browser Type

Use the `brd_browser` parameter to simulate requests from specific browsers:
- `brd_browser=chrome` — Google Chrome
- `brd_browser=safari` — Safari
- `brd_browser=firefox` — Mozilla Firefox (not compatible with `brd_mobile=1`)
If not specified, the API uses a random browser.
**Example**:
```bash
curl --proxy brd.superproxy.io:33335 \
--proxy-user "brd-customer--zone-:" \
"https://www.google.com/search?q=best+gaming+laptops&brd_browser=chrome"
```
**Example** (Combining browser and device type):
```bash
curl --proxy brd.superproxy.io:33335 \
--proxy-user "brd-customer--zone-:" \
"https://www.google.com/search?q=best+smartphones&brd_browser=safari&brd_mobile=ios"
```
### Parsing Results
Receive search results in a structured format using the `brd_json` parameter:
- **Options**:
- `brd_json=1` - Returns results in parsed JSON format
- `brd_json=html` - Returns JSON with an additional `"html"` field containing raw HTML
Example (JSON output):
```bash
curl --proxy brd.superproxy.io:33335 \
--proxy-user "brd-customer--zone-:" \
"https://www.google.com/search?q=best+hotels+in+new+york&brd_json=1"
```
Example (JSON with raw HTML):
```bash
curl --proxy brd.superproxy.io:33335 \
--proxy-user "brd-customer--zone-:" \
"https://www.google.com/search?q=top+restaurants+in+paris&brd_json=html"
```
Learn more in our [SERP API Parsing Guide](https://docs.brightdata.com/scraping-automation/serp-api/parsing-search-results).
### Hotel Search

Refine hotel searches with these parameters:
1. `hotel_occupancy` (Number of Guests)
- Sets the number of guests (up to 4)
- Examples:
- `hotel_occupancy=1` → For 1 guest
- `hotel_occupancy=2` → For 2 guests (default)
- `hotel_occupancy=4` → For 4 guests
**Example** (Search for hotels in New York for 4 guests):
```bash
curl --proxy brd.superproxy.io:33335 \
--proxy-user "brd-customer--zone-:" \
"https://www.google.com/search?q=hotels+in+new+york&hotel_occupancy=4"
```
2. `hotel_dates` (Check-in & Check-out Dates)
- Filters results for specific date ranges
- Format: YYYY-MM-DD, YYYY-MM-DD
**Example** (Search for hotels in Paris from May 1 to May 3, 2025):
```bash
curl --proxy brd.superproxy.io:33335 \
--proxy-user "brd-customer--zone-:" \
"https://www.google.com/search?q=hotels+in+paris&hotel_dates=2025-05-01%2C2025-05-03"
```
**Combined Example**:
```bash
curl --proxy brd.superproxy.io:33335 \
--proxy-user "brd-customer--zone-:" \
"https://www.google.com/search?q=hotels+in+tokyo&hotel_occupancy=2&hotel_dates=2025-05-01%2C2025-05-03"
```
### Parallel Searches
Send multiple search requests simultaneously within the same peer and session—ideal for comparing results.
1. Send a POST request with a `multi` array containing search variations
2. Get a `response_id` for later result retrieval
3. Retrieve results using the `response_id` once processing completes
**Step 1: Send Parallel Requests**
```bash
RESPONSE_ID=$(curl -i --silent --compressed \
"https://api.brightdata.com/serp/req?customer=&zone=" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer API_TOKEN" \
-d $'{
"country": "us",
"multi": [
{"query": {"q": "top+macbook+for+developers", "num": 20}},
{"query": {"q": "top+macbook+for+developers", "num": 100}}
]
}' | sed -En 's/^x-response-id: (.*)/\1/p' | tr -d '\r')
echo "Response ID: $RESPONSE_ID"
```
**Step 2: Fetch Results**
```bash
curl -v --compressed \
"https://api.brightdata.com/serp/get_result?customer=&zone=&response_id=${RESPONSE_ID}" \
-H "Authorization: Bearer API_TOKEN"
```
You can also search for multiple keywords in one request:
```bash
{
"multi":[
{"query":{"q":"best+smartphones+2025"}},
{"query":{"q":"best+laptops+2025"}}
]
}
```
Learn more about asynchronous requests [here](https://docs.brightdata.com/scraping-automation/serp-api/asynchronous-requests).
### AI Overview

Google sometimes includes AI-generated summaries (AI Overviews) at the top of search results. Use `brd_ai_mode=1` to increase the chances of seeing these AI-generated overviews:
```bash
curl --proxy brd.superproxy.io:33335 \
--proxy-user "brd-customer--zone-:" \
"https://www.google.com/search?q=how+does+caffeine+affect+sleep&brd_ai_mode=1"
```
## Support & Resources
- **Documentation:** [SERP API Docs](https://docs.brightdata.com/scraping-automation/serp-api/)
- **SEO Use Cases:** [SEO Tracking and Insights](https://brightdata.com/use-cases/serp-tracking)
- **Other Guides:**
- [SERP API](https://github.com/luminati-io/serp-api)
- [Web Unlocker API](https://github.com/luminati-io/web-unlocker-api)
- [Google Maps Scraper](https://github.com/luminati-io/Google-Maps-Scraper)
- [Google News Scraper](https://github.com/luminati-io/Google-News-Scraper)
- **Interesting Reads:**
- [Best SERP APIs](https://brightdata.com/blog/web-data/best-serp-apis)
- [Build a RAG Chatbot with SERP API](https://brightdata.com/blog/web-data/build-a-rag-chatbot)
- [Scrape Google Search with Python](https://brightdata.com/blog/web-data/scraping-google-with-python)
- **Technical Support:** [Contact Us](mailto:support@brightdata.com)