https://github.com/jadbin/serlist
Search engine results page scraper
https://github.com/jadbin/serlist
lxml search-engine-scraper
Last synced: 5 months ago
JSON representation
Search engine results page scraper
- Host: GitHub
- URL: https://github.com/jadbin/serlist
- Owner: jadbin
- License: apache-2.0
- Created: 2018-10-25T07:52:12.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-12-19T02:12:41.000Z (over 7 years ago)
- Last Synced: 2025-10-27T04:58:38.146Z (8 months ago)
- Topics: lxml, search-engine-scraper
- Language: Python
- Homepage:
- Size: 24.4 KB
- Stars: 13
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.rst
- License: LICENSE
Awesome Lists containing this project
README
=======
SERList
=======
.. image:: https://travis-ci.org/jadbin/serlist.svg?branch=master
:target: https://travis-ci.org/jadbin/serlist
.. image:: https://coveralls.io/repos/github/jadbin/serlist/badge.svg?branch=master
:target: https://coveralls.io/github/jadbin/serlist?branch=master
.. image:: https://img.shields.io/badge/license-Apache 2-blue.svg
:target: https://github.com/jadbin/serlist/blob/master/LICENSE
Overview
========
SERList is used to scrape the information from a search engine results page including:
- title
- link
- description
Now, SERList can well deal with the results from these search engines without setting anything (e.g. XPath):
- Google_
- Yahoo_
- Bing_
- Yandex_
- Baidu_
- Sogou_
- `360 Search`_
Installation
============
Install using pip::
pip install serlist
Basic Usage
===========
.. code-block:: python
from serlist import SerpScraper
SerpScraper().scrape(text)
The variable ``text`` is the HTML text of a search engine results page.
Documentation
=============
https://serlist.readthedocs.io/
.. _Google: https://www.google.com/
.. _Yahoo: https://www.yahoo.com/
.. _Bing: https://www.bing.com/
.. _Yandex: https://www.yandex.com/
.. _Baidu: https://www.baidu.com/
.. _Sogou: https://www.sogou.com/
.. _`360 Search`: https://www.so.com/