Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/s0md3v/goop

Google Search Scraper
https://github.com/s0md3v/goop

Last synced: about 1 month ago
JSON representation

Google Search Scraper

Awesome Lists containing this project

README

        




goop


goop


Google Search Scraper








> **Note:** It no longer works. Google team told me it's not a legitimate issue when I reported it to them but now they just silently fixed it.

### Contents

- [Introduction](#introduction)
- [How it works?](#how-it-works)
- [Usage](#usage)
- [Installation](#installation)
- [Example](#example)
- [Legal](#legal--disclaimer)

### Introduction
goop can perform google searches without being blocked by the CAPTCHA or hitting any rate limits.

### How it works?
Facebook provides a [debugger tool](https://developers.facebook.com/tools/debug/echo/?q=https://example.com) for its scraper.
Interestingly, Google doesn't limit the requests made by this debugger (whitelisted?) and hence it can be used to scrap the google search results without being blocked by the CAPTCHA.\
Since facebook is involved, a facebook session `Cookie` must be supplied to the library with each request.
### Usage
#### Installation
```
pip install goop
```
#### Example
```python
from goop import goop

page_1 = goop.search('red shoes', '')
page_2 = goop.search('red shoes', '', page='1')
include_omitted_results = goop.search('red shoes', '', page='8', full=True)
```
A `dict` of following structure is returned

```
{
"0": {
"url": "https://example.com",
"text": "Example webpage",
"summary": "This is an example webpage whose aim is to demonstrate the usage of ..."
},
"1": {
...
```

`cli.py` demonstrates the usage by performing google searches from the terminal with the following command
```
python cli.py
```

![goop-cli](https://i.ibb.co/30Vsk87/Screenshot-2019-08-02-22-42-53.png)

### Legal & Disclaimer
Scraping google search results is illegal. This library is merely a proof of concept of the bypass. The author isn't responsible for the actions of the end users.