Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/grinnz/mojo-dom58

Mojo::DOM58 - Minimalistic HTML/XML DOM parser with CSS selectors
https://github.com/grinnz/mojo-dom58
Last synced: 5 days ago
JSON representation
Mojo::DOM58 - Minimalistic HTML/XML DOM parser with CSS selectors
Host: GitHub
URL: https://github.com/grinnz/mojo-dom58
Owner: Grinnz
License: other
Created: 2016-03-25T17:55:48.000Z (almost 9 years ago)
Default Branch: master
Last Pushed: 2021-06-16T05:29:07.000Z (over 3 years ago)
Last Synced: 2024-11-13T05:16:09.593Z (about 2 months ago)
Language: Perl
Homepage: https://metacpan.org/pod/Mojo::DOM58
Size: 194 KB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.pod
- Changelog: Changes
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project

README

        =pod

=encoding utf8

=head1 NAME

Mojo::DOM58 - Minimalistic HTML/XML DOM parser with CSS selectors

=head1 SYNOPSIS

  use Mojo::DOM58;

  # Parse

  my $dom = Mojo::DOM58->new('


Test

123

');

  # Find

  say $dom->at('#b')->text;

  say $dom->find('p')->map('text')->join("\n");

  say $dom->find('[id]')->map(attr => 'id')->join("\n");

  # Iterate

  $dom->find('p[id]')->reverse->each(sub { say $_->{id} });

  # Loop

  for my $e ($dom->find('p[id]')->each) {

    say $e->{id}, ':', $e->text;

  }

  # Modify

  $dom->find('div p')->last->append('
456');

  $dom->at('#c')->prepend($dom->new_tag('p', id => 'd', '789'));

  $dom->find(':not(p)')->map('strip');

  # Render

  say "$dom";

=head1 DESCRIPTION

L is a minimalistic and relaxed pure-perl HTML/XML DOM parser based

on L. It supports the L

and L, and

matching based on L. It will

even try to interpret broken HTML and XML, so you should not use it for

validation.

=head1 FORK INFO

L is a fork of L and tracks features and fixes to stay

closely compatible with upstream. It differs only in the standalone format and

compatibility with Perl 5.8. Any bugs or patches not related to these changes

should be reported directly to the L issue tracker.

This release of L is up to date with version C<9.0> of

L.

=head1 NODES AND ELEMENTS

When we parse an HTML/XML fragment, it gets turned into a tree of nodes.

  

  

    Hello

    World!

  

There are currently eight different kinds of nodes, C, C,

C, C, C, C, C and C. Elements are nodes of

the type C.

  root

  |- doctype (html)

  +- tag (html)

     |- tag (head)

     |  +- tag (title)

     |     +- raw (Hello)

     +- tag (body)

        +- text (World!)

While all node types are represented as L objects, some methods like

L"attr"> and L"namespace"> only apply to elements.

=head1 CASE-SENSITIVITY

L defaults to HTML semantics, that means all tags and attribute

names are lowercased and selectors need to be lowercase as well.

  # HTML semantics

  my $dom = Mojo::DOM58->new('
Hi!');

  say $dom->at('p[id]')->text;

If an XML declaration is found, the parser will automatically switch into XML

mode and everything becomes case-sensitive.

  # XML semantics

  my $dom = Mojo::DOM58->new('
Hi!');

  say $dom->at('P[ID]')->text;

HTML or XML semantics can also be forced with the L"xml"> method.

  # Force HTML semantics

  my $dom = Mojo::DOM58->new->xml(0)->parse('
Hi!');

  say $dom->at('p[id]')->text;

  # Force XML semantics

  my $dom = Mojo::DOM58->new->xml(1)->parse('
Hi!');

  say $dom->at('P[ID]')->text;

=head1 SELECTORS

L uses a CSS selector engine based on L. All CSS

selectors that make sense for a standalone parser are supported.

=over

=item Z<>*

Any element.

  my $all = $dom->find('*');

=item E

An element of type C.

  my $title = $dom->at('title');

=item E[foo]

An C element with a C attribute.

  my $links = $dom->find('a[href]');

=item E[foo="bar"]

An C element whose C attribute value is exactly equal to C.

  my $case_sensitive = $dom->find('input[type="hidden"]');

  my $case_sensitive = $dom->find('input[type=hidden]');

=item E[foo="bar" i]

An C element whose C attribute value is exactly equal to any

(ASCII-range) case-permutation of C. Note that this selector is

B and might change without warning!

  my $case_insensitive = $dom->find('input[type="hidden" i]');

  my $case_insensitive = $dom->find('input[type=hidden i]');

  my $case_insensitive = $dom->find('input[class~="foo" i]');

This selector is part of

L, which is still a work

in progress.

=item E[foo="bar" s]

An C element whose C attribute value is exactly and case-sensitively

equal to C. Note that this selector is B and might change

without warning!

  my $case_sensitive = $dom->find('input[type="hidden" s]');

This selector is part of

L, which is still a work

in progress.

=item E[foo~="bar"]

An C element whose C attribute value is a list of whitespace-separated

values, one of which is exactly equal to C.

  my $foo = $dom->find('input[class~="foo"]');

  my $foo = $dom->find('input[class~=foo]');

=item E[foo^="bar"]

An C element whose C attribute value begins exactly with the string

C.

  my $begins_with = $dom->find('input[name^="f"]');

  my $begins_with = $dom->find('input[name^=f]');

=item E[foo$="bar"]

An C element whose C attribute value ends exactly with the string

C.

  my $ends_with = $dom->find('input[name$="o"]');

  my $ends_with = $dom->find('input[name$=o]');

=item E[foo*="bar"]

An C element whose C attribute value contains the substring C.

  my $contains = $dom->find('input[name*="fo"]');

  my $contains = $dom->find('input[name*=fo]');

=item E[foo|="en"]

An C element whose C attribute has a hyphen-separated list of values

beginning (from the left) with C.

  my $english = $dom->find('link[hreflang|=en]');

=item E:root

An C element, root of the document.

  my $root = $dom->at(':root');

=item E:nth-child(n)

An C element, the C child of its parent.

  my $third = $dom->find('div:nth-child(3)');

  my $odd   = $dom->find('div:nth-child(odd)');

  my $even  = $dom->find('div:nth-child(even)');

  my $top3  = $dom->find('div:nth-child(-n+3)');

=item E:nth-last-child(n)

An C element, the C child of its parent, counting from the last one.

  my $third    = $dom->find('div:nth-last-child(3)');

  my $odd      = $dom->find('div:nth-last-child(odd)');

  my $even     = $dom->find('div:nth-last-child(even)');

  my $bottom3  = $dom->find('div:nth-last-child(-n+3)');

=item E:nth-of-type(n)

An C element, the C sibling of its type.

  my $third = $dom->find('div:nth-of-type(3)');

  my $odd   = $dom->find('div:nth-of-type(odd)');

  my $even  = $dom->find('div:nth-of-type(even)');

  my $top3  = $dom->find('div:nth-of-type(-n+3)');

=item E:nth-last-of-type(n)

An C element, the C sibling of its type, counting from the last one.

  my $third    = $dom->find('div:nth-last-of-type(3)');

  my $odd      = $dom->find('div:nth-last-of-type(odd)');

  my $even     = $dom->find('div:nth-last-of-type(even)');

  my $bottom3  = $dom->find('div:nth-last-of-type(-n+3)');

=item E:first-child

An C element, first child of its parent.

  my $first = $dom->find('div p:first-child');

=item E:last-child

An C element, last child of its parent.

  my $last = $dom->find('div p:last-child');

=item E:first-of-type

An C element, first sibling of its type.

  my $first = $dom->find('div p:first-of-type');

=item E:last-of-type

An C element, last sibling of its type.

  my $last = $dom->find('div p:last-of-type');

=item E:only-child

An C element, only child of its parent.

  my $lonely = $dom->find('div p:only-child');

=item E:only-of-type

An C element, only sibling of its type.

  my $lonely = $dom->find('div p:only-of-type');

=item E:empty

An C element that has no children (including text nodes).

  my $empty = $dom->find(':empty');

=item E:any-link

Alias for L"E:link">. Note that this selector is B and might

change without warning! This selector is part of

L, which is still a

work in progress.

=item E:link

An C element being the source anchor of a hyperlink of which the target is

not yet visited (C<:link>) or already visited (C<:visited>). Note that

L is not stateful, therefore C<:any-link>, C<:link> and

C<:visited> yield exactly the same results.

  my $links = $dom->find(':any-link');

  my $links = $dom->find(':link');

  my $links = $dom->find(':visited');

=item E:visited

Alias for L"E:link">.

=item E:scope

An C element being a designated reference element. Note that this selector is B and might change

without warning!

  my $scoped = $dom->find('a:not(:scope > a)');

  my $scoped = $dom->find('div :scope p');

  my $scoped = $dom->find('~ p');

This selector is part of L, which is still a work in progress.

=item E:checked

A user interface element C which is checked (for instance a radio-button or

checkbox).

  my $input = $dom->find(':checked');

=item E.warning

An C element whose class is "warning".

  my $warning = $dom->find('div.warning');

=item E#myid

An C element with C equal to "myid".

  my $foo = $dom->at('div#foo');

=item E:not(s1, s2)

An C element that does not match either compound selector C or compound

selector C. Note that support for compound selectors is B and

might change without warning!

  my $others = $dom->find('div p:not(:first-child, :last-child)');

Support for compound selectors was added as part of

L, which is still a work

in progress.

=item E:is(s1, s2)

An C element that matches compound selector C and/or compound selector

C. Note that this selector is B and might change without warning!

  my $headers = $dom->find(':is(section, article, aside, nav) h1');

This selector is part of

L, which is still a work

in progress.

=item E:has(rs1, rs2)

An C element, if either of the relative selectors C or C, when evaluated with C as the :scope elements,

match an element. Note that this selector is B and might change without warning!

  my $link = $dom->find('a:has(> img)');

This selector is part of L, which is still a work in progress.

Also be aware that this feature is currently marked C, so there is a high chance that it will get removed

completely.

=item A|E

An C element that belongs to the namespace alias C from

L.

Key/value pairs passed to selector methods are used to declare namespace

aliases.


  my $elem = $dom->find('lq|elem', lq => 'http://example.com/q-markup');

Using an empty alias searches for an element that belongs to no namespace.

  my $div = $dom->find('|div');

=item E F

An C element descendant of an C element.

  my $headlines = $dom->find('div h1');

=item E E F

An C element child of an C element.

  my $headlines = $dom->find('html > body > div > h1');

=item E + F

An C element immediately preceded by an C element.

  my $second = $dom->find('h1 + h2');

=item E ~ F

An C element preceded by an C element.

  my $second = $dom->find('h1 ~ h2');

=item E, F, G

Elements of type C, C and C.

  my $headlines = $dom->find('h1, h2, h3');

=item E[foo=bar][bar=baz]

An C element whose attributes match all following attribute selectors.

  my $links = $dom->find('a[foo^=b][foo$=ar]');

=back

=head1 OPERATORS

L overloads the following operators.

=head2 array

  my @nodes = @$dom;

Alias for L"child_nodes">.

  # ""

  $dom->parse('123')->[0];

=head2 bool

  my $bool = !!$dom;

Always true.

=head2 hash

  my %attrs = %$dom;

Alias for L"attr">.

  # "test"

  $dom->parse('
Test
')->at('div')->{id};

=head2 stringify

  my $str = "$dom";

Alias for L"to_string">.

=head1 FUNCTIONS

L implements the following functions, which can be imported

individually.

=head2 tag_to_html

  my $str = tag_to_html 'div', id => 'foo', 'safe content';

Generate HTML/XML tag and render it right away. This is a significantly faster

alternative to L"new_tag"> for template systems that have to generate a lot

of tags.

=head1 METHODS

L implements the following methods.

=head2 new

  my $dom = Mojo::DOM58->new;

  my $dom = Mojo::DOM58->new('I ♥ Mojo::DOM58!');

Construct a new scalar-based L object and L"parse"> HTML/XML

fragment if necessary.

=head2 new_tag

  my $tag = Mojo::DOM58->new_tag('div');

  my $tag = $dom->new_tag('div');

  my $tag = $dom->new_tag('div', id => 'foo', hidden => undef);

  my $tag = $dom->new_tag('div', 'safe content');

  my $tag = $dom->new_tag('div', id => 'foo', 'safe content');

  my $tag = $dom->new_tag('div', data => {mojo => 'rocks'}, 'safe content');

  my $tag = $dom->new_tag('div', id => 'foo', sub { 'unsafe content' });

Construct a new L object for an HTML/XML tag with or without

attributes and content. The C attribute may contain a hash reference with

key/value pairs to generate attributes from.

  # "
"

  $dom->new_tag('br');

  # "
"

  $dom->new_tag('div');

  # "
"

  $dom->new_tag('div', id => 'foo', hidden => undef);

  # "
test & 123"

  $dom->new_tag('div', 'test & 123');

  # "
test & 123"

  $dom->new_tag('div', id => 'foo', 'test & 123');

  # "
test & 123""

  $dom->new_tag('div', data => {foo => 1, Bar => 'test'}, 'test & 123');

  # "
test & 123"

  $dom->new_tag('div', id => 'foo', sub { 'test & 123' });

  # "
HelloMojo!

"

  $dom->parse('Hello')->at('div')

    ->append_content($dom->new_tag('b', 'Mojo!'))->root;

=head2 all_text

  my $text = $dom->all_text;

Extract text content from all descendant nodes of this element. For HTML documents C and C<style> elements are

excluded.

  # "foo\nbarbaz\n"

  $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->at('div')->all_text;

=head2 ancestors

  my $collection = $dom->ancestors;

  my $collection = $dom->ancestors('div ~ p');

Find all ancestor elements of this node matching the CSS selector and return a

L<collection|/"COLLECTION METHODS"> containing these elements as L<Mojo::DOM58>

objects. All selectors listed in L</"SELECTORS"> are supported.

  # List tag names of ancestor elements

  say $dom->ancestors->map('tag')->join("\n");

=head2 append

  $dom = $dom->append('<p>I ♥ Mojo::DOM58!</p>');

  $dom = $dom->append(Mojo::DOM58->new);

Append HTML/XML fragment to this node (for all node types other than C<root>).

  # "<div><h1>Test</h1><h2>123</h2></div>"

  $dom->parse('<div><h1>Test</h1></div>')

    ->at('h1')->append('<h2>123</h2>')->root;

  # "<p>Test 123</p>"

  $dom->parse('<p>Test</p>')->at('p')

    ->child_nodes->first->append(' 123')->root;

=head2 append_content

  $dom = $dom->append_content('<p>I ♥ Mojo::DOM58!</p>');

  $dom = $dom->append_content(Mojo::DOM58->new);

Append HTML/XML fragment (for C<root> and C<tag> nodes) or raw content to this

node's content.

  # "<div><h1>Test123</h1></div>"

  $dom->parse('<div><h1>Test</h1></div>')

    ->at('h1')->append_content('123')->root;

  # "<!-- Test 123 --><br>"

  $dom->parse('<!-- Test --><br>')

    ->child_nodes->first->append_content('123 ')->root;

  # "<p>Test<i>123</i></p>"

  $dom->parse('<p>Test</p>')->at('p')->append_content('<i>123</i>')->root;

=head2 at

  my $result = $dom->at('div ~ p');

  my $result = $dom->at('svg|line', svg => 'http://www.w3.org/2000/svg');

Find first descendant element of this element matching the CSS selector and

return it as a L<Mojo::DOM58> object, or C<undef> if none could be found. All

selectors listed in L</"SELECTORS"> are supported.

  # Find first element with "svg" namespace definition

  my $namespace = $dom->at('[xmlns\:svg]')->{'xmlns:svg'};

Trailing key/value pairs can be used to declare xml namespace aliases.

  # "<rect />"

  $dom->parse('<svg xmlns="http://www.w3.org/2000/svg"><rect /></svg>')

    ->at('svg|rect', svg => 'http://www.w3.org/2000/svg');

=head2 attr

  my $hash = $dom->attr;

  my $foo  = $dom->attr('foo');

  $dom     = $dom->attr({foo => 'bar'});

  $dom     = $dom->attr(foo => 'bar');

This element's attributes.

  # Remove an attribute

  delete $dom->attr->{id};

  # Attribute without value

  $dom->attr(selected => undef);

  # List id attributes

  say $dom->find('*')->map(attr => 'id')->compact->join("\n");

=head2 child_nodes

  my $collection = $dom->child_nodes;

Return a L<collection|/"COLLECTION METHODS"> containing all child nodes of this

element as L<Mojo::DOM58> objects.

  # "<p><b>123</b></p>"

  $dom->parse('<p>Test<b>123</b></p>')->at('p')->child_nodes->first->remove;

  # "<!DOCTYPE html>"

  $dom->parse('<!DOCTYPE html><b>123</b>')->child_nodes->first;

  # " Test "

  $dom->parse('<b>123</b><!-- Test -->')->child_nodes->last->content;

=head2 children

  my $collection = $dom->children;

  my $collection = $dom->children('div ~ p');

Find all child elements of this element matching the CSS selector and return a

L<collection|/"COLLECTION METHODS"> containing these elements as L<Mojo::DOM58>

objects. All selectors listed in L</"SELECTORS"> are supported.

  # Show tag name of random child element

  say $dom->children->shuffle->first->tag;

=head2 content

  my $str = $dom->content;

  $dom    = $dom->content('<p>I ♥ Mojo::DOM58!</p>');

  $dom    = $dom->content(Mojo::DOM58->new);

Return this node's content or replace it with HTML/XML fragment (for C<root>

and C<tag> nodes) or raw content.

  # "<b>Test</b>"

  $dom->parse('<div><b>Test</b></div>')->at('div')->content;

  # "<div><h1>123</h1></div>"

  $dom->parse('<div><h1>Test</h1></div>')->at('h1')->content('123')->root;

  # "<p><i>123</i></p>"

  $dom->parse('<p>Test</p>')->at('p')->content('<i>123</i>')->root;

  # "<div><h1></h1></div>"

  $dom->parse('<div><h1>Test</h1></div>')->at('h1')->content('')->root;

  # " Test "

  $dom->parse('<!-- Test --><br>')->child_nodes->first->content;

  # "<div><!-- 123 -->456</div>"

  $dom->parse('<div><!-- Test -->456</div>')

    ->at('div')->child_nodes->first->content(' 123 ')->root;

=head2 descendant_nodes

  my $collection = $dom->descendant_nodes;

Return a L<collection|/"COLLECTION METHODS"> containing all descendant nodes of

this element as L<Mojo::DOM58> objects.

  # "<p><b>123</b></p>"

  $dom->parse('<p><!-- Test --><b>123<!-- 456 --></b></p>')

    ->descendant_nodes->grep(sub { $_->type eq 'comment' })

    ->map('remove')->first;

  # "<p><b>test</b>test</p>"

  $dom->parse('<p><b>123</b>456</p>')

    ->at('p')->descendant_nodes->grep(sub { $_->type eq 'text' })

    ->map(content => 'test')->first->root;

=head2 find

  my $collection = $dom->find('div ~ p');

  my $collection = $dom->find('svg|line', svg => 'http://www.w3.org/2000/svg');

Find all descendant elements of this element matching the CSS selector and

return a L<collection|/"COLLECTION METHODS"> containing these elements as

L<Mojo::DOM58> objects. All selectors listed in L</"SELECTORS"> are supported.

  # Find a specific element and extract information

  my $id = $dom->find('div')->[23]{id};

  # Extract information from multiple elements

  my @headers = $dom->find('h1, h2, h3')->map('text')->each;

  # Count all the different tags

  my $hash = $dom->find('*')->reduce(sub { $a->{$b->tag}++; $a }, {});

  # Find elements with a class that contains dots

  my @divs = $dom->find('div.foo\.bar')->each;

Trailing key/value pairs can be used to declare xml namespace aliases.

  # "<rect />"

  $dom->parse('<svg xmlns="http://www.w3.org/2000/svg"><rect /></svg>')

    ->find('svg|rect', svg => 'http://www.w3.org/2000/svg')->first;

=head2 following

  my $collection = $dom->following;

  my $collection = $dom->following('div ~ p');

Find all sibling elements after this node matching the CSS selector and return

a L<collection|/"COLLECTION METHODS"> containing these elements as

L<Mojo::DOM58> objects. All selectors listed in L</"SELECTORS"> are supported.

  # List tags of sibling elements after this node

  say $dom->following->map('tag')->join("\n");

=head2 following_nodes

  my $collection = $dom->following_nodes;

Return a L<collection|/"COLLECTION METHODS"> containing all sibling nodes after

this node as L<Mojo::DOM58> objects.

  # "C"

  $dom->parse('<p>A</p><!-- B -->C')->at('p')->following_nodes->last->content;

=head2 matches

  my $bool = $dom->matches('div ~ p');

  my $bool = $dom->matches('svg|line', svg => 'http://www.w3.org/2000/svg');

Check if this element matches the CSS selector. All selectors listed in

L</"SELECTORS"> are supported.

  # True

  $dom->parse('<p class="a">A</p>')->at('p')->matches('.a');

  $dom->parse('<p class="a">A</p>')->at('p')->matches('p[class]');

  # False

  $dom->parse('<p class="a">A</p>')->at('p')->matches('.b');

  $dom->parse('<p class="a">A</p>')->at('p')->matches('p[id]');

Trailing key/value pairs can be used to declare xml namespace aliases.

  # True

  $dom->parse('<svg xmlns="http://www.w3.org/2000/svg"><rect /></svg>')

    ->matches('svg|rect', svg => 'http://www.w3.org/2000/svg');

=head2 namespace

  my $namespace = $dom->namespace;

Find this element's namespace, or return C<undef> if none could be found.

  # "http://www.w3.org/2000/svg"

  Mojo::DOM58->new('<svg xmlns:svg="http://www.w3.org/2000/svg"><svg:circle>3.14</svg:circle></svg>')->at('svg\:circle')->namespace;

  # Find namespace for an element with namespace prefix

  my $namespace = $dom->at('svg > svg\:circle')->namespace;

  # Find namespace for an element that may or may not have a namespace prefix

  my $namespace = $dom->at('svg > circle')->namespace;

=head2 next

  my $sibling = $dom->next;

Return L<Mojo::DOM58> object for next sibling element, or C<undef> if there are

no more siblings.

  # "<h2>123</h2>"

  $dom->parse('<div><h1>Test</h1><h2>123</h2></div>')->at('h1')->next;

=head2 next_node

  my $sibling = $dom->next_node;

Return L<Mojo::DOM58> object for next sibling node, or C<undef> if there are no

more siblings.

  # "456"

  $dom->parse('<p><b>123</b><!-- Test -->456</p>')

    ->at('b')->next_node->next_node;

  # " Test "

  $dom->parse('<p><b>123</b><!-- Test -->456</p>')

    ->at('b')->next_node->content;

=head2 parent

  my $parent = $dom->parent;

Return L<Mojo::DOM58> object for parent of this node, or C<undef> if this node

has no parent.

  # "<b><i>Test</i></b>"

  $dom->parse('<p><b><i>Test</i></b></p>')->at('i')->parent;

=head2 parse

  $dom = $dom->parse('<foo bar="baz">I ♥ Mojo::DOM58!</foo>');

Parse HTML/XML fragment.

  # Parse XML

  my $dom = Mojo::DOM58->new->xml(1)->parse('<foo>I ♥ Mojo::DOM58!</foo>');

=head2 preceding

  my $collection = $dom->preceding;

  my $collection = $dom->preceding('div ~ p');

Find all sibling elements before this node matching the CSS selector and return

a L<collection|/"COLLECTION METHODS"> containing these elements as

L<Mojo::DOM58> objects. All selectors listed in L</"SELECTORS"> are supported.

  # List tags of sibling elements before this node

  say $dom->preceding->map('tag')->join("\n");

=head2 preceding_nodes

  my $collection = $dom->preceding_nodes;

Return a L<collection|/"COLLECTION METHODS"> containing all sibling nodes

before this node as L<Mojo::DOM58> objects.

  # "A"

  $dom->parse('A<!-- B --><p>C</p>')->at('p')->preceding_nodes->first->content;

=head2 prepend

  $dom = $dom->prepend('<p>I ♥ Mojo::DOM58!</p>');

  $dom = $dom->prepend(Mojo::DOM58->new);

Prepend HTML/XML fragment to this node (for all node types other than C<root>).

  # "<div><h1>Test</h1><h2>123</h2></div>"

  $dom->parse('<div><h2>123</h2></div>')

    ->at('h2')->prepend('<h1>Test</h1>')->root;

  # "<p>Test 123</p>"

  $dom->parse('<p>123</p>')

    ->at('p')->child_nodes->first->prepend('Test ')->root;

=head2 prepend_content

  $dom = $dom->prepend_content('<p>I ♥ Mojo::DOM58!</p>');

  $dom = $dom->prepend_content(Mojo::DOM58->new);

Prepend HTML/XML fragment (for C<root> and C<tag> nodes) or raw content to this

node's content.

  # "<div><h2>Test123</h2></div>"

  $dom->parse('<div><h2>123</h2></div>')

    ->at('h2')->prepend_content('Test')->root;

  # "<!-- Test 123 --><br>"

  $dom->parse('<!-- 123 --><br>')

    ->child_nodes->first->prepend_content(' Test')->root;

  # "<p><i>123</i>Test</p>"

  $dom->parse('<p>Test</p>')->at('p')->prepend_content('<i>123</i>')->root;

=head2 previous

  my $sibling = $dom->previous;

Return L<Mojo::DOM58> object for previous sibling element, or C<undef> if there

are no more siblings.

  # "<h1>Test</h1>"

  $dom->parse('<div><h1>Test</h1><h2>123</h2></div>')->at('h2')->previous;

=head2 previous_node

  my $sibling = $dom->previous_node;

Return L<Mojo::DOM58> object for previous sibling node, or C<undef> if there are

no more siblings.

  # "123"

  $dom->parse('<p>123<!-- Test --><b>456</b></p>')

    ->at('b')->previous_node->previous_node;

  # " Test "

  $dom->parse('<p>123<!-- Test --><b>456</b></p>')

    ->at('b')->previous_node->content;

=head2 remove

  my $parent = $dom->remove;

Remove this node and return L</"root"> (for C<root> nodes) or L</"parent">.

  # "<div></div>"

  $dom->parse('<div><h1>Test</h1></div>')->at('h1')->remove;

  # "<p><b>456</b></p>"

  $dom->parse('<p>123<b>456</b></p>')

    ->at('p')->child_nodes->first->remove->root;

=head2 replace

  my $parent = $dom->replace('<div>I ♥ Mojo::DOM58!</div>');

  my $parent = $dom->replace(Mojo::DOM58->new);

Replace this node with HTML/XML fragment and return L</"root"> (for C<root>

nodes) or L</"parent">.

  # "<div><h2>123</h2></div>"

  $dom->parse('<div><h1>Test</h1></div>')->at('h1')->replace('<h2>123</h2>');

  # "<p><b>123</b></p>"

  $dom->parse('<p>Test</p>')

    ->at('p')->child_nodes->[0]->replace('<b>123</b>')->root;

=head2 root

  my $root = $dom->root;

Return L<Mojo::DOM58> object for C<root> node.

=head2 selector

  my $selector = $dom->selector;

Get a unique CSS selector for this element.

  # "ul:nth-child(1) > li:nth-child(2)"

  $dom->parse('<ul><li>Test</li><li>123</li></ul>')->find('li')->last->selector;

  # "p:nth-child(1) > b:nth-child(1) > i:nth-child(1)"

  $dom->parse('<p><b><i>Test</i></b></p>')->at('i')->selector;

=head2 strip

  my $parent = $dom->strip;

Remove this element while preserving its content and return L</"parent">.

  # "<div>Test</div>"

  $dom->parse('<div><h1>Test</h1></div>')->at('h1')->strip;

=head2 tag

  my $tag = $dom->tag;

  $dom    = $dom->tag('div');

This element's tag name.

  # List tag names of child elements

  say $dom->children->map('tag')->join("\n");

=head2 tap

  $dom = $dom->tap(sub {...});

Equivalent to L<Mojo::Base/"tap">.

=head2 text

  my $text = $dom->text;

Extract text content from this element only (not including child elements).

  # "bar"

  $dom->parse("<div>foo<p>bar</p>baz</div>")->at('p')->text;

  # "foo\nbaz\n"

  $dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->at('div')->text;

=head2 to_string

  my $str = $dom->to_string;

Render this node and its content to HTML/XML.

  # "<b>Test</b>"

  $dom->parse('<div><b>Test</b></div>')->at('div b')->to_string;

To extract text content from all descendant nodes, see L</"all_text">.

=head2 tree

  my $tree = $dom->tree;

  $dom     = $dom->tree(['root']);

Document Object Model. Note that this structure should only be used very

carefully since it is very dynamic.

=head2 type

  my $type = $dom->type;

This node's type, usually C<cdata>, C<comment>, C<doctype>, C<pi>, C<raw>,

C<root>, C<tag> or C<text>.

  # "cdata"

  $dom->parse('<![CDATA[Test]]>')->child_nodes->first->type;

  # "comment"

  $dom->parse('<!-- Test -->')->child_nodes->first->type;

  # "doctype"

  $dom->parse('<!DOCTYPE html>')->child_nodes->first->type;

  # "pi"

  $dom->parse('<?xml version="1.0"?>')->child_nodes->first->type;

  # "raw"

  $dom->parse('<title>Test</title>')->at('title')->child_nodes->first->type;

  # "root"

  $dom->parse('<p>Test</p>')->type;

  # "tag"

  $dom->parse('<p>Test</p>')->at('p')->type;

  # "text"

  $dom->parse('<p>Test</p>')->at('p')->child_nodes->first->type;

=head2 val

  my $value = $dom->val;

Extract value from form element (such as C<button>, C<input>, C<option>,

C<select> and C<textarea>), or return C<undef> if this element has no value. In

the case of C<select> with C<multiple> attribute, find C<option> elements with

C<selected> attribute and return an array reference with all values, or

C<undef> if none could be found.

  # "a"

  $dom->parse('<input name=test value=a>')->at('input')->val;

  # "b"

  $dom->parse('<textarea>b</textarea>')->at('textarea')->val;

  # "c"

  $dom->parse('<option value="c">Test</option>')->at('option')->val;

  # "d"

  $dom->parse('<select><option selected>d</option></select>')

    ->at('select')->val;

  # "e"

  $dom->parse('<select multiple><option selected>e</option></select>')

    ->at('select')->val->[0];

  # "on"

  $dom->parse('<input name=test type=checkbox>')->at('input')->val;

=head2 with_roles

  my $new_class = Mojo::DOM58->with_roles('Mojo::DOM58::Role::One');

  my $new_class = Mojo::DOM58->with_roles('+One', '+Two');

  $dom          = $dom->with_roles('+One', '+Two');

Equivalent to L<Mojo::Base/"with_roles">. Note that role support depends on

L<Role::Tiny> (2.000001+).

=head2 wrap

  $dom = $dom->wrap('<div></div>');

  $dom = $dom->wrap(Mojo::DOM58->new);

Wrap HTML/XML fragment around this node (for all node types other than C<root>),

placing it as the last child of the first innermost element.

  # "<p>123<b>Test</b></p>"

  $dom->parse('<b>Test</b>')->at('b')->wrap('<p>123</p>')->root;

  # "<div><p><b>Test</b></p>123</div>"

  $dom->parse('<b>Test</b>')->at('b')->wrap('<div><p></p>123</div>')->root;

  # "<p><b>Test</b></p><p>123</p>"

  $dom->parse('<b>Test</b>')->at('b')->wrap('<p></p><p>123</p>')->root;

  # "<p><b>Test</b></p>"

  $dom->parse('<p>Test</p>')->at('p')->child_nodes->first->wrap('<b>')->root;

=head2 wrap_content

  $dom = $dom->wrap_content('<div></div>');

  $dom = $dom->wrap_content(Mojo::DOM58->new);

Wrap HTML/XML fragment around this node's content (for C<root> and C<tag>

nodes), placing it as the last children of the first innermost element.

  # "<p><b>123Test</b></p>"

  $dom->parse('<p>Test<p>')->at('p')->wrap_content('<b>123</b>')->root;

  # "<p><b>Test</b></p><p>123</p>"

  $dom->parse('<b>Test</b>')->wrap_content('<p></p><p>123</p>');

=head2 xml

  my $bool = $dom->xml;

  $dom     = $dom->xml($bool);

Disable HTML semantics in parser and activate case-sensitivity, defaults to

auto detection based on XML declarations.

=head1 COLLECTION METHODS

Some L<Mojo::DOM58> methods return an array-based collection object based on

L<Mojo::Collection>, which can either be accessed directly as an array

reference, or with the following methods.

  # Chain methods

  $collection->map(sub { ucfirst })->shuffle->each(sub {

    my ($word, $num) = @_;

    say "$num: $word";

  });

  # Access array directly to manipulate collection

  $collection->[23] += 100;

  say for @$collection;

=head2 compact

  my $new = $collection->compact;

Create a new L<collection|/"COLLECTION METHODS"> with all elements that are

defined and not an empty string.

  # $collection contains (0, 1, undef, 2, '', 3)

  $collection->compact->join(', '); # "0, 1, 2, 3"

=head2 each

  my @elements = $collection->each;

  $collection  = $collection->each(sub {...});

Evaluate callback for each element in collection or return all elements as a

list if none has been provided. The element will be the first argument passed

to the callback and is also available as C<$_>.

  # Make a numbered list

  $collection->each(sub {

    my ($e, $num) = @_;

    say "$num: $e";

  });

=head2 first

  my $first = $collection->first;

  my $first = $collection->first(qr/foo/);

  my $first = $collection->first(sub {...});

  my $first = $collection->first($method);

  my $first = $collection->first($method, @args);

Evaluate regular expression/callback for, or call method on, each element in

collection and return the first one that matched the regular expression, or for

which the callback/method returned true. The element will be the first argument

passed to the callback and is also available as C<$_>.

  # Longer version

  my $first = $collection->first(sub { $_->$method(@args) });

  # Find first value that contains the word "mojo"

  my $interesting = $collection->first(qr/mojo/i);

  # Find first value that is greater than 5

  my $greater = $collection->first(sub { $_ > 5 });

=head2 flatten

  my $new = $collection->flatten;

Flatten nested collections/arrays recursively and create a new

L<collection|/"COLLECTION METHODS"> with all elements.

  # $collection contains (1, [2, [3, 4], 5, [6]], 7)

  $collection->flatten->join(', '); # "1, 2, 3, 4, 5, 6, 7"

=head2 grep

  my $new = $collection->grep(qr/foo/);

  my $new = $collection->grep(sub {...});

  my $new = $collection->grep($method);

  my $new = $collection->grep($method, @args);

Evaluate regular expression/callback for, or call method on, each element in

collection and create a new L<collection|/"COLLECTION METHODS"> with all

elements that matched the regular expression, or for which the callback/method

returned true. The element will be the first argument passed to the callback

and is also available as C<$_>.

  # Longer version

  my $new = $collection->grep(sub { $_->$method(@args) });

  # Find all values that contain the word "mojo"

  my $interesting = $collection->grep(qr/mojo/i);

  # Find all values that are greater than 5

  my $greater = $collection->grep(sub { $_ > 5 });

=head2 head

  my $new = $collection->head(4);

  my $new = $collection->head(-2);

Create a new L<collection|/"COLLECTION METHODS"> with up to the specified

number of elements from the beginning of the collection. A negative number will

count from the end.

  # $collection contains ('A', 'B', 'C', 'D', 'E')

  $collection->head(3)->join(' '); # "A B C"

  $collection->head(-3)->join(' '); # "A B"

=head2 join

  my $stream = $collection->join;

  my $stream = $collection->join("\n");

Turn collection into string.

  # Join all values with commas

  $collection->join(', ');

=head2 last

  my $last = $collection->last;

Return the last element in collection.

=head2 map

  my $new = $collection->map(sub {...});

  my $new = $collection->map($method);

  my $new = $collection->map($method, @args);

Evaluate callback for, or call method on, each element in collection and create

a new L<collection|/"COLLECTION METHODS"> from the results. The element will be

the first argument passed to the callback and is also available as C<$_>.

  # Longer version

  my $new = $collection->map(sub { $_->$method(@args) });

  # Append the word "mojo" to all values

  my $domified = $collection->map(sub { $_ . 'mojo' });

=head2 reduce

  my $result = $collection->reduce(sub {...});

  my $result = $collection->reduce(sub {...}, $initial);

Reduce elements in collection with callback, the first element will be used as

initial value if none has been provided.

  # Calculate the sum of all values

  my $sum = $collection->reduce(sub { $a + $b });

  # Count how often each value occurs in collection

  my $hash = $collection->reduce(sub { $a->{$b}++; $a }, {});

=head2 reverse

  my $new = $collection->reverse;

Create a new L<collection|/"COLLECTION METHODS"> with all elements in reverse

order.

=head2 slice

  my $new = $collection->slice(4 .. 7);

Create a new L<collection|/"COLLECTION METHODS"> with all selected elements.

  # $collection contains ('A', 'B', 'C', 'D', 'E')

  $collection->slice(1, 2, 4)->join(' '); # "B C E"

=head2 shuffle

  my $new = $collection->shuffle;

Create a new L<collection|/"COLLECTION METHODS"> with all elements in random

order.

=head2 size

  my $size = $collection->size;

Number of elements in collection.

=head2 sort

  my $new = $collection->sort;

  my $new = $collection->sort(sub {...});

Sort elements based on return value of callback and create a new

L<collection|/"COLLECTION METHODS"> from the results.

  # Sort values case-insensitive

  my $case_insensitive = $collection->sort(sub { uc($a) cmp uc($b) });

=head2 tail

  my $new = $collection->tail(4);

  my $new = $collection->tail(-2);

Create a new L<collection|/"COLLECTION METHODS"> with up to the specified

number of elements from the end of the collection. A negative number will count

from the beginning.

  # $collection contains ('A', 'B', 'C', 'D', 'E')

  $collection->tail(3)->join(' '); # "C D E"

  $collection->tail(-3)->join(' '); # "D E"

=head2 tap

  $collection = $collection->tap(sub {...});

Equivalent to L<Mojo::Base/"tap">.

=head2 to_array

  my $array = $collection->to_array;

Turn collection into array reference.

=head2 uniq

  my $new = $collection->uniq;

  my $new = $collection->uniq(sub {...});

  my $new = $collection->uniq($method);

  my $new = $collection->uniq($method, @args);

Create a new L<collection|/"COLLECTION METHODS"> without duplicate elements,

using the string representation of either the elements or the return value of

the callback/method to decide uniqueness. Note that C<undef> and empty string

are treated the same.

  # Longer version

  my $new = $collection->uniq(sub { $_->$method(@args) });

  # $collection contains ('foo', 'bar', 'bar', 'baz')

  $collection->uniq->join(' '); # "foo bar baz"

  # $collection contains ([1, 2], [2, 1], [3, 2])

  $collection->uniq(sub{ $_->[1] })->to_array; # "[[1, 2], [2, 1]]"

=head2 with_roles

  $collection = $collection->with_roles('Mojo::Collection::Role::One');

Equivalent to L<Mojo::Base/"with_roles">. Note that role support depends on

L<Role::Tiny> (2.000001+).

=head1 DEBUGGING

You can set the C<MOJO_DOM58_CSS_DEBUG> environment variable to get some advanced diagnostics information printed to

C<STDERR>.

  MOJO_DOM58_CSS_DEBUG=1

=head1 BUGS

Report issues related to the format of this distribution or Perl 5.8 support to

the public bugtracker. Any other issues should be reported directly to the

upstream L<Mojolicious> issue tracker.

=head1 AUTHOR

Dan Book <[email protected]>

Code and tests adapted from L<Mojo::DOM>, a lightweight DOM parser by the L<Mojolicious> team.

=head1 CONTRIBUTORS

=over

=item Matt S Trout (mst)

=back

=head1 COPYRIGHT AND LICENSE

Copyright (c) 2008-2016 Sebastian Riedel and others.

Copyright (c) 2016 L</"AUTHOR"> and L</"CONTRIBUTORS"> for adaptation to standalone format.

This is free software, licensed under:

  The Artistic License 2.0 (GPL Compatible)

=head1 SEE ALSO

L<Mojo::DOM>, L<HTML::TreeBuilder>, L<XML::LibXML>, L<XML::Twig>, L<XML::Smart>

=for Pod::Coverage TO_JSON

=cut