https://github.com/doctorOb/dompy
Javascript DOM objects in python. Parse html like you would in the browser.
https://github.com/doctorOb/dompy
Last synced: 3 months ago
JSON representation
Javascript DOM objects in python. Parse html like you would in the browser.
- Host: GitHub
- URL: https://github.com/doctorOb/dompy
- Owner: doctorOb
- Created: 2014-07-29T19:34:12.000Z (almost 11 years ago)
- Default Branch: master
- Last Pushed: 2014-09-14T18:18:15.000Z (over 10 years ago)
- Last Synced: 2024-10-12T19:07:15.673Z (7 months ago)
- Language: Python
- Size: 176 KB
- Stars: 0
- Watchers: 4
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-python-html - doctorOb/dompy
- awesome-python-html - doctorOb/dompy
README
dompy
=====Javascript DOM objects in python. Parse html like you would in the browser.
The goal is to implement [XML DOM Elements](http://www.w3schools.com/dom/dom_element.asp) in Python.
BeautifulSoup is a wonderfully robust library for parsing html, and navigating the document tree. However, I found it unpleasent to work with. As someone who's used to manipulating DOM elements in javascript, I wanted the ability to do the same thing in python.
It (roughly) works as follows:
```
document = dompy.Document(htmlString)
document.getElementsByClassName('td') #get all the TDs in the document
document.getElementById('foo').innerText #get the text in the foo element
```BeautifulSoup is still used to parse the html document, but the Tags are then iterated over and converted to a Node class
which aims to closely resemble those found in javascript. So, as you'd access the body's className or tagName just as you would in javascript, like:```
document.body.className
document.getElementsByName('idk')[0].tagName #figure out what tag has been named so vaguely
```