https://github.com/onyazuka/htmlparser
HTML parser written in Python
https://github.com/onyazuka/htmlparser
dom html javascript parser python python3
Last synced: about 2 months ago
JSON representation
HTML parser written in Python
- Host: GitHub
- URL: https://github.com/onyazuka/htmlparser
- Owner: onyazuka
- Created: 2019-04-22T12:03:18.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2020-02-25T12:57:38.000Z (over 6 years ago)
- Last Synced: 2025-05-15T17:13:48.064Z (about 1 year ago)
- Topics: dom, html, javascript, parser, python, python3
- Language: Python
- Size: 12.7 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# HTMLParser
HTML parser written in Python
It implements most of needed functions for convenient working with HTML DOM.
## Features
- Parsing from string or from URL(with or without connection);
- All DOM readonly functions;
- CSS query selectors;
## Warnings
When using querySelect, please keep in mind some differences from native CSS selectors:
- when using selectors like querySelectorAll("input[type='text']"), attribute value should always be quoted;
- when using complex selectors like querySelectorAll("div > li > div"), there should be at least one space between each selector and operator.
## Usage
From URL:
```python
from parser import *
dom = HTMLDomParser(PARSER_MODE["URL"], "http://my_favourite_web_site.zzz")
doc = dom.getDocument()
divs = doc.getElementsByTagName("div")
firstDiv = divs[0]
firstDivFirstChild = firstDiv.firstElementChild()
secondDiv = divs[1]
secondDiv2 = firstDiv.nextElementSibling()
navs = doc.getElementsByClassName("nav")
classyDivs = doc.querySelectorAll("div[class]")
divLiDiv = doc.querySelectorAll("div > li > div")
...
```
Or from string:
```python
from parser import *
dom = HTMLDomParser(PARSER_MODE["RAW"], "......")
```