https://github.com/jmcarp/robobrowser

Last synced: about 1 year ago
JSON representation

Host: GitHub
URL: https://github.com/jmcarp/robobrowser
Owner: jmcarp
License: bsd-3-clause
Created: 2014-02-08T21:29:49.000Z (over 12 years ago)
Default Branch: master
Last Pushed: 2020-09-10T18:41:47.000Z (almost 6 years ago)
Last Synced: 2025-05-13T07:09:06.725Z (about 1 year ago)
Language: Python
Size: 563 KB
Stars: 3,712
Watchers: 109
Forks: 338
Open Issues: 59
Metadata Files:
- Readme: README.rst
- Changelog: HISTORY.rst
- Contributing: docs/contributing.rst
- License: LICENSE

Awesome Lists containing this project

fucking-awesome-python-cn - RoboBrowser
awesome-python - robobrowser - A simple, Pythonic library for browsing the web without a standalone web browser. (Web Crawling)
awesome-python - RoboBrowser - A simple, Pythonic library for browsing the web without a standalone web browser. (Web Crawling)
python-awesome - robobrowser - A simple, Pythonic library for browsing the web without a standalone web browser. (Web Crawling)
awesome-python-resources - GitHub - 58% open · ⏱️ 07.06.2015): (HTML 处理)
awesome-python - robobrowser - A simple, Pythonic library for browsing the web without a standalone web browser. (Web Crawling)
awesome-fullstack - Robo Browser
awesome-fullstack - Robo Browser
awesome-python - RoboBrowser - A simple, Pythonic library for browsing the web without a standalone web browser. (Web Crawling)
awesome-python - robobrowser - Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world. ` 📝 6 years ago ` (Web Crawling [🔝](#readme))
awesome-python - RoboBrowser - A simple, Pythonic library for browsing the web without a standalone web browser. (Web Crawling & Web Scraping)
fucking-awesome-python - :octocat: robobrowser - :star: 3680 :fork_and_knife: 343 - A simple, Pythonic library for browsing the web without a standalone web browser. (Web Crawling)
awesome-python - RoboBrowser - A simple, Pythonic library for browsing the web without a standalone web browser. (Web Crawling & Web Scraping)
awesome-python-cn - RoboBrowser
Python-Awesome - RoboBrowser - A simple, Pythonic library for browsing the web without a standalone web browser. (Web Crawling)
awesome-python - robobrowser - A simple, Pythonic library for browsing the web without a standalone web browser. (Web Crawling)
awesome-crawler - RoboBrowser - A simple, Pythonic library for browsing the web without a standalone web browser. (Python)
git-github.com-vinta-awesome-python - RoboBrowser - A simple, Pythonic library for browsing the web without a standalone web browser. (Web Crawling & Web Scraping)
starred-awesome - robobrowser - (Python)
fucking_awesome_python - RoboBrowser - A simple, Pythonic library for browsing the web without a standalone web browser. (Web Crawling)
awesome-crawler-cn - RoboBrowser - 一个简单的，不基于Web浏览器的基于Python的Web 浏览器. (Python)
my-awesome-github-stars - jmcarp/robobrowser - (Python)

README

          RoboBrowser: Your friendly neighborhood web scraper

===============================================

.. image:: https://badge.fury.io/py/robobrowser.png

    :target: http://badge.fury.io/py/robobrowser

.. image:: https://travis-ci.org/jmcarp/robobrowser.png?branch=master

        :target: https://travis-ci.org/jmcarp/robobrowser

.. image:: https://coveralls.io/repos/jmcarp/robobrowser/badge.png?branch=master

        :target: https://coveralls.io/r/jmcarp/robobrowser

Homepage: `http://robobrowser.readthedocs.org/ `_

RoboBrowser is a simple, Pythonic library for browsing the web without a standalone web browser. RoboBrowser

can fetch a page, click on links and buttons, and fill out and submit forms. If you need to interact with web services

that don't have APIs, RoboBrowser can help.

.. code-block:: python

    import re

    from robobrowser import RoboBrowser

    # Browse to Genius

    browser = RoboBrowser(history=True)

    browser.open('http://genius.com/')

    # Search for Porcupine Tree

    form = browser.get_form(action='/search')

    form                # 

    form['q'].value = 'porcupine tree'

    browser.submit_form(form)

    # Look up the first song

    songs = browser.select('.song_link')

    browser.follow_link(songs[0])

    lyrics = browser.select('.lyrics')

    lyrics[0].text      # \nHear the sound of music ...

    # Back to results page

    browser.back()

    # Look up my favorite song

    song_link = browser.get_link('trains')

    browser.follow_link(song_link)

    # Can also search HTML using regex patterns

    lyrics = browser.find(class_=re.compile(r'\blyrics\b'))

    lyrics.text         # \nTrain set and match spied under the blind...

RoboBrowser combines the best of two excellent Python libraries:

`Requests `_ and

`BeautifulSoup `_.

RoboBrowser represents browser sessions using Requests and HTML responses

using BeautifulSoup, transparently exposing methods of both libraries:

.. code-block:: python

    import re

    from robobrowser import RoboBrowser

    browser = RoboBrowser(user_agent='a python robot')

    browser.open('https://github.com/')

    # Inspect the browser session

    browser.session.cookies['_gh_sess']         # BAh7Bzo...

    browser.session.headers['User-Agent']       # a python robot

    # Search the parsed HTML

    browser.select('div.teaser-icon')       # [


                                            # 

                                            # ,

                                            # ...

    browser.find(class_=re.compile(r'column', re.I))    # 

                                                        # 

                                                        # 

                                                        # ...

You can also pass a custom `Session` instance for lower-level configuration:

.. code-block:: python

    from requests import Session

    from robobrowser import RoboBrowser

    session = Session()

    session.verify = False  # Skip SSL verification

    session.proxies = {'http': 'http://custom.proxy.com/'}  # Set default proxies

    browser = RoboBrowser(session=session)

RoboBrowser also includes tools for working with forms, inspired by

`WebTest `_ and `Mechanize `_.

.. code-block:: python

    from robobrowser import RoboBrowser

    browser = RoboBrowser()

    browser.open('http://twitter.com')

    # Get the signup form

    signup_form = browser.get_form(class_='signup')

    signup_form         # 

    form['vehicle']                 # 

    # Checked values can be get and set like lists

    form['vehicle'].options         # [u'Bike', u'Car']

    form['vehicle'].value           # []

    form['vehicle'].value = ['Bike']

    form['vehicle'].value = ['Bike', 'Car']

    # Values can also be set using input labels

    form['vehicle'].labels          # [u'I have a bike', u'I have a car \r\n']

    form['vehicle'].value = ['I have a bike']

    form['vehicle'].value           # [u'Bike']

    # Only values that correspond to checkbox values or labels can be set;

    # this will raise a `ValueError`

    form['vehicle'].value = ['Hot Dogs']

Uploading files:

.. code-block:: python

    from robobrowser import RoboBrowser

    # Browse to a page with an upload form

    browser = RoboBrowser()

    browser.open('http://cgi-lib.berkeley.edu/ex/fup.html')

    # Find the form

    upload_form = browser.get_form()

    upload_form                     # 

    # Choose a file to upload

    upload_form['upfile']           # 

    upload_form['upfile'].value = open('path/to/file.txt', 'r')

    # Submit

    browser.submit(upload_form)

By default, creating a browser instantiates a new requests `Session`. 

Requirements

------------

- Python >= 2.6 or >= 3.3

License

-------

MIT licensed. See the bundled `LICENSE `_ file for more details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jmcarp/robobrowser

Awesome Lists containing this project

README