Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/runekaagaard/py-webkit-html-manipulator
Automatically exported from code.google.com/p/py-webkit-html-manipulator
https://github.com/runekaagaard/py-webkit-html-manipulator
Last synced: 11 days ago
JSON representation
Automatically exported from code.google.com/p/py-webkit-html-manipulator
- Host: GitHub
- URL: https://github.com/runekaagaard/py-webkit-html-manipulator
- Owner: runekaagaard
- Created: 2015-05-02T11:24:34.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2015-05-02T11:29:36.000Z (over 9 years ago)
- Last Synced: 2024-11-07T11:48:14.959Z (2 months ago)
- Language: Python
- Size: 145 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README
Awesome Lists containing this project
README
== Welcome to Py Webkit Html Manipulator ==
This script allows you to render a webpage on your server and manipulate and
extract the rendered html. The manipulation is done with
javascript in the browser using the pyQt Webkit Api.The difference from a normal html scraper is that javscript and css also is
rendered.== Installation ==
Install python and pyQt. On ubuntu you would run:sudo apt-get install python python-qt4
If you make it work on other platforms please let me know how.
== Usage ==
# Get help
./whm.py --help# Run it
./whm.py -u 'http://example.com'# Run it headless. Works only on linux. You need to install xvfb first.
# On ubuntu run sudo apt-get install xvfb
xvfb-run --server-args="-screen 0, 640x480x24" ./whm.py --url='http://example.com' --js-file=whm-example.js== The format of the js file ==
If no --js-file argument is supplied the html for webpage will be outputted.
If --js-file argument is present the output will be the content of
the variable WebkitHtmlManipulator.result that must be available in the global
scope.See whm-example.js for an example. It adds information about position,
font-size, etc. to each element in the dom.== Contact ==
Rune Kaagaard
Copenhagen, Denmark
[email protected]