https://github.com/howie6879/magic_google

Google search results crawler, get google search results that you need
https://github.com/howie6879/magic_google

crawler google google-search spider

Last synced: 3 months ago
JSON representation

Google search results crawler, get google search results that you need

Host: GitHub
URL: https://github.com/howie6879/magic_google
Owner: howie6879
Created: 2017-01-12T06:55:21.000Z (over 8 years ago)
Default Branch: master
Last Pushed: 2023-11-14T10:11:26.000Z (over 1 year ago)
Last Synced: 2025-03-29T08:06:11.272Z (4 months ago)
Topics: crawler, google, google-search, spider
Language: Python
Size: 39.1 KB
Stars: 403
Watchers: 22
Forks: 110
Open Issues: 3
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        ## magic_google

[![](https://img.shields.io/pypi/v/magic_google.svg)](https://pypi.org/project/magic-google/)

### 1.What's magic_google

This is an easy Google Searching crawler that you can get anything you want in the page by using it.

During the process of  crawling,you need to pay attention to the limitation from google towards ip address and the warning of exception , so I suggest that you should pause running the program and own the Proxy ip

php - [MagicGoogle](https://github.com/howie6879/php-google)

### 2.How to Use?

Run

``` shell

pip install magic_google

# Or

pip install git+https://github.com/howie6879/magic_google.git

# Or

git clone https://github.com/howie6879/magic_google.git

cd magic_google

vim google_search.py

# Or 

python setup.py install

```

Example

``` python

from magic_google import MagicGoogle

import pprint

# Or PROXIES = None

PROXIES = [{

    'http': 'http://192.168.2.207:1080',

    'https': 'http://192.168.2.207:1080'

}]

# Or MagicGoogle()

mg = MagicGoogle(PROXIES)

#  Crawling the whole page

result = mg.search_page(query='python')

# Crawling url

for url in mg.search_url(query='python'):

    pprint.pprint(url)

    

# Output

# 'https://www.python.org/'

# 'https://www.python.org/downloads/'

# 'https://www.python.org/about/gettingstarted/'

# 'https://docs.python.org/2/tutorial/'

# 'https://docs.python.org/'

# 'https://en.wikipedia.org/wiki/Python_(programming_language)'

# 'https://www.codecademy.com/courses/introduction-to-python-6WeG3/0?curriculum_id=4f89dab3d788890003000096'

# 'https://www.codecademy.com/learn/python'

# 'https://developers.google.com/edu/python/'

# 'https://learnpythonthehardway.org/book/'

# 'https://www.continuum.io/downloads'

# Get {'title','url','text'}

for i in mg.search(query='python', num=1):

    pprint.pprint(i)

    

# Output

# {'text': 'The official home of the Python Programming Language.',

# 'title': 'Welcome to Python .org',

# 'url': 'https://www.python.org/'}

```

You can see [google_search.py](./examples/google_search.py)

**If  you need a big amount of querie but only having an ip address,I suggest  you can have a time lapse between 5s ~ 30s.**

The reason that it always return empty might be as follows:

```html

302 Moved

302 Moved

The document has moved

here.

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/howie6879/magic_google

Awesome Lists containing this project

README

302 Moved