https://github.com/onyazuka/htmlparser

HTML parser written in Python
https://github.com/onyazuka/htmlparser

dom html javascript parser python python3

Last synced: about 2 months ago
JSON representation

HTML parser written in Python

Host: GitHub
URL: https://github.com/onyazuka/htmlparser
Owner: onyazuka
Created: 2019-04-22T12:03:18.000Z (about 7 years ago)
Default Branch: master
Last Pushed: 2020-02-25T12:57:38.000Z (over 6 years ago)
Last Synced: 2025-05-15T17:13:48.064Z (about 1 year ago)
Topics: dom, html, javascript, parser, python, python3
Language: Python
Size: 12.7 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # HTMLParser

HTML parser written in Python

It implements most of needed functions for convenient working with HTML DOM.

## Features

- Parsing from string or from URL(with or without connection);

- All DOM readonly functions;

- CSS query selectors;

## Warnings

When using querySelect, please keep in mind some differences from native CSS selectors:

- when using selectors like querySelectorAll("input[type='text']"), attribute value should always be quoted;

- when using complex selectors like querySelectorAll("div > li > div"), there should be at least one space between each selector and operator.

## Usage

From URL:

```python

  from parser import *

  dom = HTMLDomParser(PARSER_MODE["URL"], "http://my_favourite_web_site.zzz")

  doc = dom.getDocument()

  divs = doc.getElementsByTagName("div")

  firstDiv = divs[0]

  firstDivFirstChild = firstDiv.firstElementChild()

  secondDiv = divs[1]

  secondDiv2 = firstDiv.nextElementSibling()

  navs = doc.getElementsByClassName("nav")

  classyDivs = doc.querySelectorAll("div[class]")

  divLiDiv = doc.querySelectorAll("div > li > div")

  ...

```

Or from string:

```python

  from parser import *

  dom = HTMLDomParser(PARSER_MODE["RAW"], "......")

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/onyazuka/htmlparser

Awesome Lists containing this project

README