Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kostya/modest
CSS selectors for HTML5 Parser myhtml
https://github.com/kostya/modest
crystal css css-selector selectors
Last synced: about 1 month ago
JSON representation
CSS selectors for HTML5 Parser myhtml
- Host: GitHub
- URL: https://github.com/kostya/modest
- Owner: kostya
- License: mit
- Created: 2016-11-21T14:13:15.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2018-08-04T18:21:12.000Z (over 6 years ago)
- Last Synced: 2024-10-25T01:30:24.255Z (about 2 months ago)
- Topics: crystal, css, css-selector, selectors
- Language: Crystal
- Homepage:
- Size: 175 KB
- Stars: 46
- Watchers: 6
- Forks: 4
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-crystal - modest - CSS selectors for HTML5 Parser myhtml (HTML/XML Parsing)
README
## WARNING, this shard obsolete and moved to [myhtml](https://github.com/kostya/myhtml) directly, use [myhtml](https://github.com/kostya/myhtml) >= 1.0.0
# modest
CSS selectors for HTML5 Parser [myhtml](https://github.com/kostya/myhtml) (Crystal wrapper for https://github.com/lexborisov/Modest).
## Installation
Add this to your application's `shard.yml`:
```yaml
dependencies:
modest:
github: kostya/modest
```## Usage of CSS Selectors with myhtml parser
```crystal
require "modest"page = <<-PAGE
PAGEmyhtml = Myhtml::Parser.new(page)
# css select from the root! scope (equal with myhtml.root!.css("..."))
iterator = myhtml.css("div.aaa p#bbb a.ccc") # => Iterator(Myhtml::Node), methods: .each, .to_a, ...iterator.each do |node|
p node.tag_id # MyHTML_TAG_A
p node.tag_name # "a"
p node.tag_sym # :a
p node.attributes["href"]? # "http://..."
p node.inner_text # "bla"
puts node.to_html # bla
end# css select from node scope
if p_node = myhtml.css("div.aaa p#bbb").first?
p_node.css("a.ccc").each do |node|
p node.tag_sym # :a
end
end```
## Example 2
```crystal
require "modest"html = <<-PAGE
PAGEparser = Myhtml::Parser.new(html)
# select all p nodes which id like `*p*`
p parser.css("p[id*=p]").map(&.attribute_by("id")).to_a # => ["p1", "p2", "p3", "p4", "p5", "p6"]# select all nodes with class "jo"
p parser.css("p.jo").map(&.attribute_by("id")).to_a # => ["p2", "p4", "p6"]
p parser.css(".jo").map(&.attribute_by("id")).to_a # => ["p2", "p4", "p6"]# select odd child tag inside div, which not contain a
p parser.css("div > :nth-child(2n+1):not(:has(a))").map(&.attribute_by("id")).to_a # => ["p1", "p4", "p6"]# all elements with class=jo inside last div tag
p parser.css("div").to_a.last.css(".jo").map(&.attribute_by("id")).to_a # => ["p4", "p6"]# a element with href ends like .png
p parser.css(%q{a[href$=".png"]}).map(&.attribute_by("id")).to_a # => ["a2"]# find all a tags inside
, which href contain `html`
p parser.css(%q{p[id=p3] > a[href*="html"]}).map(&.attribute_by("id")).to_a # => ["a1"]# find all a tags inside
, which href contain `html` or ends_with `.png`
p parser.css(%q{p[id=p3] > a:matches([href *= "html"], [href $= ".png"])}).map(&.attribute_by("id")).to_a # => ["a1", "a2"]# create finder and use it in many places, this is faster, than create it many times
finder = Modest::Finder.new(".jo")
p parser.css(finder).map(&.attribute_by("id")).to_a # => ["p2", "p4", "p6"]
```## Example 3
```crystal
require "modest"html = <<-PAGE
Hello
123other
foocolumns
barare
xyzignored
PAGEparser = Myhtml::Parser.new(html)
p parser.css("#t2 tr td:first-child").map(&.inner_text).to_a # => ["123", "foo", "bar", "xyz"]
p parser.css("#t2 tr td:first-child").map(&.to_html).to_a # => ["123", "foo", "bar", "xyz"]
```## Benchmark
Comparing with nokorigi(libxml), and crystagiri(libxml). Parse 1000 times google page, code: https://github.com/kostya/modest/tree/master/bench
```crystal
require "modest"
page = File.read("./google.html")
s = 0
links = [] of String
1000.times do
myhtml = Myhtml::Parser.new(page)
links = myhtml.css("div.g h3.r a").map(&.attribute_by("href")).to_a
s += links.size
myhtml.free
end
p links.last
p s
```Parse + Selectors
| Lang | Package | Time, s | Memory, MiB |
| -------- | ------------------ | ------- | ----------- |
| Crystal | modest(myhtml) | 2.52 | 7.7 |
| Crystal | Crystagiri(LibXML) | 19.89 | 14.3 |
| Ruby 2.2 | Nokogiri(LibXML) | 45.05 | 136.2 |Selectors Only (files with suffix 2)
| Lang | Package | Time, s | Memory, MiB |
| -------- | ------------------ | ------- | ----------- |
| Crystal | modest(myhtml) | 0.18 | 4.6 |
| Crystal | Crystagiri(LibXML) | 12.30 | 6.6 |
| Ruby 2.2 | Nokogiri(LibXML) | 28.06 | 68.8 |## CSS Selectors rules
https://drafts.csswg.org/selectors-4/