https://github.com/bisohns/search-engine-parser

Lightweight package to query popular search engines and scrape for result titles, links and descriptions
https://github.com/bisohns/search-engine-parser

anime bing cli coursera google keyword library pypi python scraping search search-engine search-engine-parser searching yahoo

Last synced: 6 months ago
JSON representation

Lightweight package to query popular search engines and scrape for result titles, links and descriptions

Host: GitHub
URL: https://github.com/bisohns/search-engine-parser
Owner: bisohns
Created: 2019-02-01T21:50:16.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2024-05-03T05:23:26.000Z (over 1 year ago)
Last Synced: 2025-03-28T11:11:06.175Z (6 months ago)
Topics: anime, bing, cli, coursera, google, keyword, library, pypi, python, scraping, search, search-engine, search-engine-parser, searching, yahoo
Language: Python
Homepage: https://search-engine-parser.readthedocs.io
Size: 22.3 MB
Stars: 472
Watchers: 8
Forks: 87
Open Issues: 19
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Support: docs/supported_engines.md

Awesome Lists containing this project

README

          # Search Engine Parser

"If it is a search engine, then it can be parsed" - some random guy

![Demo](https://github.com/bisoncorps/search-engine-parser/raw/master/assets/animate.gif)

[![Python 3.6|3.7|3.8|3.9](https://img.shields.io/badge/python-3.5%7C3.6%7C3.7%7C3.8-blue)](https://www.python.org/downloads/)

[![PyPI version](https://img.shields.io/pypi/v/search-engine-parser)](https://pypi.org/project/search-engine-parser/)

[![PyPI - Downloads](https://img.shields.io/pypi/dm/search-engine-parser)](https://pypi.org/project/search-engine-parser/)

[![Deploy to Pypi](https://github.com/bisohns/search-engine-parser/actions/workflows/deploy.yml/badge.svg)](https://github.com/bisohns/search-engine-parser/actions/workflows/deploy.yml)

[![Test](https://github.com/bisohns/search-engine-parser/actions/workflows/test.yml/badge.svg)](https://github.com/bisohns/search-engine-parser/actions/workflows/test.yml)

[![Documentation Status](https://readthedocs.org/projects/search-engine-parser/badge/?version=latest)](https://search-engine-parser.readthedocs.io/en/latest/?badge=latest)

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

[![All Contributors](https://img.shields.io/badge/all_contributors-10-orange.svg)](#contributors)



search-engine-parser is a package that lets you query popular search engines and scrape for result titles, links, descriptions and more. It aims to scrape the widest range of search engines.

View all supported engines [here.](https://github.com/bisoncorps/search-engine-parser/blob/master/docs/supported_engines.md)

- [Search Engine Parser](#search-engine-parser)

  - [Popular Supported Engines](#popular-supported-engines)

  - [Installation](#installation)

  - [Development](#development)

  - [Code Documentation](#code-documentation)

  - [Running the tests](#running-the-tests)

  - [Usage](#usage)

    - [Code](#code)

    - [Command line](#command-line)

  - [FAQ](docs/faq.md)

  - [Code of Conduct](#code-of-conduct)

  - [Contribution](#contribution)

  - [License (MIT)](#license-mit)

## Popular Supported Engines

Popular search engines supported include:

- Google

- DuckDuckGo

- GitHub

- StackOverflow

- Baidu

- YouTube

View all supported engines [here.](docs/supported_engines.md)

## Installation

Install from PyPi:

```bash

    # install only package dependencies

    pip install search-engine-parser

    # Installs `pysearch` cli  tool

    pip install "search-engine-parser[cli]"

```

or from master:

```bash

  pip install git+https://github.com/bisoncorps/search-engine-parser

```

## Development

Clone the repository:

```bash

    git clone git@github.com:bisoncorps/search-engine-parser.git

```

Then create a virtual environment and install the required packages:

```bash

    mkvirtualenv search_engine_parser

    pip install -r requirements/dev.txt

```

## Code Documentation

Code docs can be found on [Read the Docs](https://search-engine-parser.readthedocs.io/en/latest).

## Running the tests

```bash

    pytest

```

## Usage

### Code

Query results can be scraped from popular search engines, as shown in the example snippet below.

```python

  import pprint

  from search_engine_parser.core.engines.bing import Search as BingSearch

  from search_engine_parser.core.engines.google import Search as GoogleSearch

  from search_engine_parser.core.engines.yahoo import Search as YahooSearch

  search_args = ('preaching to the choir', 1)

  gsearch = GoogleSearch()

  ysearch = YahooSearch()

  bsearch = BingSearch()

  gresults = gsearch.search(*search_args)

  yresults = ysearch.search(*search_args)

  bresults = bsearch.search(*search_args)

  a = {

      "Google": gresults,

      "Yahoo": yresults,

      "Bing": bresults

      }

  # pretty print the result from each engine

  for k, v in a.items():

      print(f"-------------{k}------------")

      for result in v:

          pprint.pprint(result)

  # print first title from google search

  print(gresults["titles"][0])

  # print 10th link from yahoo search

  print(yresults["links"][9])

  # print 6th description from bing search

  print(bresults["descriptions"][5])

  # print first result containing links, descriptions and title

  print(gresults[0])

```

For localization, you can pass the `url` keyword and a localized url. This queries and parses the localized url using the same engine's parser:

```python

  # Use google.de instead of google.com

  results = gsearch.search(*search_args, url="google.de")

```

If you need results in a specific language you can pass the 'hl' keyword and the 2-letter country abbreviation (here's a [handy list](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes)):

```python

  # Use 'it' to receive italian results

  results = gsearch.search(*search_args, hl="it")

```

#### Cache

The results are automatically cached for engine searches. You can either bypass the cache by adding `cache=False` to the `search` or `async_search` method or clear the engine's cache

```python

    from search_engine_parser.core.engines.github import Search as GitHub

    github = GitHub()

    # bypass the cache

    github.search("search-engine-parser", cache=False)

    #OR

    # clear cache before search

    github.clear_cache()

    github.search("search-engine-parser")

```

#### Proxy

Adding a proxy entails sending details to the search function

```python

    from search_engine_parser.core.engines.github import Search as GitHub

    github = GitHub()

    github.search("search-engine-parser",

        # http proxies supported only

        proxy='http://123.12.1.0',

        proxy_auth=('username', 'password'))

```

#### Async

search-engine-parser supports `async`:

```python

   results = await gsearch.async_search(*search_args)

```

#### Results

The `SearchResults` after searching:

```python

  >>> results = gsearch.search("preaching to the choir", 1)

  >>> results

  

  # the object supports retrieving individual results by iteration of just by type (links, descriptions, titles)

  >>> results[0] # returns the first 

  >>> results[0]["description"] # gets the description of the first item

  >>> results[0]["link"] # gets the link of the first item

  >>> results["descriptions"] # returns a list of all descriptions from all results

```

It can be iterated like a normal list to return individual `SearchItem`s.

### Command line

search-engine-parser comes with a CLI tool known as `pysearch`. You can use it as such:

```bash

pysearch --engine bing  --type descriptions "Preaching to the choir"

```

Result:

```bash

'Preaching to the choir' originated in the USA in the 1970s. It is a variant of the earlier 'preaching to the converted', which dates from England in the late 1800s and has the same meaning. Origin - the full story 'Preaching to the choir' (also sometimes spelled quire) is of US origin.

```

![Demo](https://github.com/bisoncorps/search-engine-parser/raw/master/assets/example.gif)

```bash

usage: pysearch [-h] [-V] [-e ENGINE] [--show-summary] [-u URL] [-p PAGE]

                [-t TYPE] [-cc] [-r RANK] [--proxy PROXY]

                [--proxy-user PROXY_USER] [--proxy-password PROXY_PASSWORD]

                query

SearchEngineParser

positional arguments:

  query                 Query string to search engine for

optional arguments:

  -h, --help            show this help message and exit

  -V, --version         show program's version number and exit

  -e ENGINE, --engine ENGINE

                        Engine to use for parsing the query e.g google, yahoo,

                        bing,duckduckgo (default: google)

  --show-summary        Shows the summary of an engine

  -u URL, --url URL     A custom link to use as base url for search e.g

                        google.de

  -p PAGE, --page PAGE  Page of the result to return details for (default: 1)

  -t TYPE, --type TYPE  Type of detail to return i.e full, links, desciptions

                        or titles (default: full)

  -cc, --clear-cache    Clear cache of engine before searching

  -r RANK, --rank RANK  ID of Detail to return e.g 5 (default: 0)

  --proxy PROXY         Proxy address to make use of

  --proxy-user PROXY_USER

                        Proxy user to make use of

  --proxy-password PROXY_PASSWORD

                        Proxy password to make use of

```

## Code of Conduct

Make sure to adhere to the [code of conduct](CODE_OF_CONDUCT.md) at all times.

## Contribution

Before making any contributions, please read the [contribution guide](CONTRIBUTING.md).

## License (MIT)

This project is licensed under the [MIT 2.0 License](LICENSE) which allows very broad use for both academic and commercial purposes.

## Contributors ✨

Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):

  

    
_{Ed Luff}
💻

    
_{Diretnan Domnan}
🚇 ⚠️ 🔧 💻

    
_MeNsaaH
🚇 ⚠️ 🔧 💻

    
_{Aditya Pal}
⚠️ 💻 📖

    
_{Avinash Reddy}
🐛

    
_{David Onuh}
💻 ⚠️

    
_{Panagiotis Simakis}
💻 ⚠️

  

  

    
_reiarthur
💻

    
_{Ashokkumar TA}
💻

    
_{Andreas Teuber}
💻

    
_mi096684
🐛

    
_devajithvs
💻

    
_{Geg Zakaryan}
💻 🐛

    
_{Hakan Boğan}
🐛

  

  

    
_NicKoehler
🐛 💻

    
_ChrisLin
🐛 💻

    
_Pietro
💻 🐛

  

This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bisohns/search-engine-parser

Awesome Lists containing this project

README