Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kostya/lexbor
Fast HTML5 Parser with CSS selectors. This is successor of myhtml and expected to be faster and use less memory.
https://github.com/kostya/lexbor
Last synced: 13 days ago
JSON representation
Fast HTML5 Parser with CSS selectors. This is successor of myhtml and expected to be faster and use less memory.
- Host: GitHub
- URL: https://github.com/kostya/lexbor
- Owner: kostya
- License: mit
- Created: 2019-12-30T05:29:34.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2024-06-26T19:30:41.000Z (5 months ago)
- Last Synced: 2024-10-25T01:30:25.209Z (19 days ago)
- Language: Crystal
- Homepage:
- Size: 249 KB
- Stars: 95
- Watchers: 5
- Forks: 14
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# Lexbor
[![Build Status](https://github.com/kostya/lexbor/actions/workflows/ci.yml/badge.svg)](https://github.com/kostya/lexbor/actions/workflows/ci.yml?query=branch%3Amaster+event%3Apush)
Fast HTML5 Parser with CSS selectors (based on lexborisov's HTML5 parser [lexbor](https://github.com/lexbor/lexbor)). This is successor of [myhtml](https://github.com/kostya/myhtml) and expected to be faster and use less memory. Usage is almost equal to myhtml.
## Installation
Install dependency cmake:
sudo apt install cmake
Add this to your application's `shard.yml`:
```yaml
dependencies:
lexbor:
github: kostya/lexbor
```## Usage example
```crystal
require "lexbor"html = <<-HTML
HTMLlexbor = Lexbor::Parser.new(html)
lexbor.nodes(:div).each do |node|
id = node["id"]?if first_link = node.scope.nodes(:a).first?
href = first_link["href"]?
link_text = first_link.inner_textputs "div with id #{id} have link [#{link_text}](#{href})"
else
puts "div with id #{id} have no links"
end
end# Output:
# div with id t1 have link [O_o](/#)
# div with id t2 have no links```
## Css selectors example
```crystal
require "lexbor"html = <<-HTML
Hello
123other
foocolumns
barare
xyzignored
HTMLlexbor = Lexbor::Parser.new(html)
p lexbor.css("#t2 tr td:first-child").map(&.inner_text).to_a
# => ["123", "foo", "bar", "xyz"]p lexbor.css("#t2 tr td:first-child").map(&.to_html).to_a
# => ["123", "foo", "bar", "xyz"]
```## More Examples
[examples](https://github.com/kostya/lexbor/tree/master/examples)
## Development Setup:
```shell
git clone https://github.com/kostya/lexbor.git
cd lexbor
crystal src/ext/build_ext.cr
crystal spec
```## Benchmark
Parse google results page(1.5Mb) 1000 times, and 5000 times css select.
[lexbor-program](https://github.com/kostya/lexbor/tree/master/bench/test-lexbor.cr)
[myhtml-program](https://github.com/kostya/lexbor/tree/master/bench/test-myhtml.cr)
[crystagiri-program](https://github.com/kostya/lexbor/tree/master/bench/test-libxml.cr)
[nokogiri-program](https://github.com/kostya/lexbor/tree/master/bench/test-libxml.rb)Running on Ryzen 3800x.
| Lang | Lib | Parse time, s | Css time, s | Memory, MiB |
| -------- | ------------------- | ------------- | ----------- | ----------- |
| Ruby 2.7 | Nokolexbor(lexbor) | 4.07 | 1.80 | 112.5 |
| Crystal | lexbor | 4.75 | 0.80 | 12.3 |
| Crystal | myhtml(+modest) | 5.98 | 1.22 | 12.2 |
| Crystal | Crystagiri(libxml2) | 14.20 | - | 31.7 |
| Ruby 2.7 | Nokogiri(libxml2) | 18.67 | 92.69 | 398.4 |Running on Apple M1.
| Lang | Lib | Parse time, s | Css time, s | Memory, MiB |
| -------- | ------------------- | ------------- | ----------- | ----------- |
| Crystal | lexbor | 2.77 | 0.62 | 26.4 |
| Ruby 2.7 | Nokolexbor(lexbor) | 3.41 | 1.11 | 268.8 |
| Ruby 2.7 | Nokogiri(libxml2) | 18.22 | 87.21 | 232.8 |
| Crystal | Crystagiri(libxml2) | 126.25 | - | 26.8 |