Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jmcarp/robobrowser
https://github.com/jmcarp/robobrowser
Last synced: 3 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/jmcarp/robobrowser
- Owner: jmcarp
- License: bsd-3-clause
- Created: 2014-02-08T21:29:49.000Z (almost 11 years ago)
- Default Branch: master
- Last Pushed: 2020-09-10T18:41:47.000Z (over 4 years ago)
- Last Synced: 2025-01-15T23:46:12.380Z (10 days ago)
- Language: Python
- Size: 563 KB
- Stars: 3,704
- Watchers: 111
- Forks: 338
- Open Issues: 59
-
Metadata Files:
- Readme: README.rst
- Changelog: HISTORY.rst
- Contributing: docs/contributing.rst
- License: LICENSE
Awesome Lists containing this project
- my-awesome-github-stars - jmcarp/robobrowser - (Python)
- awesome-python-resources - GitHub - 58% open · ⏱️ 07.06.2015): (HTML 处理)
- starred-awesome - robobrowser - (Python)
README
RoboBrowser: Your friendly neighborhood web scraper
===============================================.. image:: https://badge.fury.io/py/robobrowser.png
:target: http://badge.fury.io/py/robobrowser.. image:: https://travis-ci.org/jmcarp/robobrowser.png?branch=master
:target: https://travis-ci.org/jmcarp/robobrowser.. image:: https://coveralls.io/repos/jmcarp/robobrowser/badge.png?branch=master
:target: https://coveralls.io/r/jmcarp/robobrowserHomepage: `http://robobrowser.readthedocs.org/ `_
RoboBrowser is a simple, Pythonic library for browsing the web without a standalone web browser. RoboBrowser
can fetch a page, click on links and buttons, and fill out and submit forms. If you need to interact with web services
that don't have APIs, RoboBrowser can help... code-block:: python
import re
from robobrowser import RoboBrowser# Browse to Genius
browser = RoboBrowser(history=True)
browser.open('http://genius.com/')# Search for Porcupine Tree
form = browser.get_form(action='/search')
form #
form['q'].value = 'porcupine tree'
browser.submit_form(form)# Look up the first song
songs = browser.select('.song_link')
browser.follow_link(songs[0])
lyrics = browser.select('.lyrics')
lyrics[0].text # \nHear the sound of music ...# Back to results page
browser.back()# Look up my favorite song
song_link = browser.get_link('trains')
browser.follow_link(song_link)# Can also search HTML using regex patterns
lyrics = browser.find(class_=re.compile(r'\blyrics\b'))
lyrics.text # \nTrain set and match spied under the blind...RoboBrowser combines the best of two excellent Python libraries:
`Requests `_ and
`BeautifulSoup `_.
RoboBrowser represents browser sessions using Requests and HTML responses
using BeautifulSoup, transparently exposing methods of both libraries:.. code-block:: python
import re
from robobrowser import RoboBrowserbrowser = RoboBrowser(user_agent='a python robot')
browser.open('https://github.com/')# Inspect the browser session
browser.session.cookies['_gh_sess'] # BAh7Bzo...
browser.session.headers['User-Agent'] # a python robot# Search the parsed HTML
,
browser.select('div.teaser-icon') # [
# ...
browser.find(class_=re.compile(r'column', re.I)) #
#