https://github.com/elliotgao2/htmlparsing
No pain HTML parsing library.
https://github.com/elliotgao2/htmlparsing
css html markdown parse xpath
Last synced: 5 months ago
JSON representation
No pain HTML parsing library.
- Host: GitHub
- URL: https://github.com/elliotgao2/htmlparsing
- Owner: elliotgao2
- Created: 2018-02-26T09:57:15.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2018-04-02T09:19:56.000Z (about 7 years ago)
- Last Synced: 2024-08-08T22:53:46.509Z (9 months ago)
- Topics: css, html, markdown, parse, xpath
- Language: Python
- Homepage:
- Size: 17.6 KB
- Stars: 12
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# HTML Parsing
No Pain HTML parsing library.
## Installation
```python
pip install htmlparsing
```## Usage
### Parse list
```python
import requests
from htmlparsing import Element, HTMLParsing, Text, Attr, Parse, HTML, Markdownurl = 'https://news.ycombinator.com/'
r = requests.get(url)
article_list = HTMLParsing(r.text).list('.athing', {'title': Text('a.storylink'), # css selector
'link': Attr('a.storylink', 'href')})
print(article_list)```
### Parse detail```python
import requests
from htmlparsing import Element, HTMLParsing, Text, Attr, Parseurl = 'https://news.ycombinator.com/item?id=16476454'
r = requests.get(url)
article_detail = HTMLParsing(r.text).detail({'title': Text('a.storylink'),
'points': Parse('span.score', '>{} points'),
'link': Attr('a.storylink', 'href')})
print(article_detail)
```### Element
```python
import requests
from htmlparsing import Element
url = 'https://python.org/'
r = requests.get(url)e = Element(text=r.text)
e.links
e.absolute_links
e.xpath('//a')[0].attrs
e.xpath('//a')[0].attrs.title
e.css('a')[0].attrs
e.parse('{}')
e.css('a')[5].text
e.css('a')[5].html
e.css('a')[5].markdown```