Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/grinnz/mojo-dom58

Mojo::DOM58 - Minimalistic HTML/XML DOM parser with CSS selectors
https://github.com/grinnz/mojo-dom58

Last synced: 5 days ago
JSON representation

Mojo::DOM58 - Minimalistic HTML/XML DOM parser with CSS selectors

Awesome Lists containing this project

README

        

=pod

=encoding utf8

=head1 NAME

Mojo::DOM58 - Minimalistic HTML/XML DOM parser with CSS selectors

=head1 SYNOPSIS

use Mojo::DOM58;

# Parse
my $dom = Mojo::DOM58->new('


Test


123


');

# Find
say $dom->at('#b')->text;
say $dom->find('p')->map('text')->join("\n");
say $dom->find('[id]')->map(attr => 'id')->join("\n");

# Iterate
$dom->find('p[id]')->reverse->each(sub { say $_->{id} });

# Loop
for my $e ($dom->find('p[id]')->each) {
say $e->{id}, ':', $e->text;
}

# Modify
$dom->find('div p')->last->append('

456

');
$dom->at('#c')->prepend($dom->new_tag('p', id => 'd', '789'));
$dom->find(':not(p)')->map('strip');

# Render
say "$dom";

=head1 DESCRIPTION

L is a minimalistic and relaxed pure-perl HTML/XML DOM parser based
on L. It supports the L
and L, and
matching based on L. It will
even try to interpret broken HTML and XML, so you should not use it for
validation.

=head1 FORK INFO

L is a fork of L and tracks features and fixes to stay
closely compatible with upstream. It differs only in the standalone format and
compatibility with Perl 5.8. Any bugs or patches not related to these changes
should be reported directly to the L issue tracker.

This release of L is up to date with version C<9.0> of
L.

=head1 NODES AND ELEMENTS

When we parse an HTML/XML fragment, it gets turned into a tree of nodes.



Hello
World!

There are currently eight different kinds of nodes, C, C,
C, C, C, C, C and C. Elements are nodes of
the type C.

root
|- doctype (html)
+- tag (html)
|- tag (head)
| +- tag (title)
| +- raw (Hello)
+- tag (body)
+- text (World!)

While all node types are represented as L objects, some methods like
L"attr"> and L"namespace"> only apply to elements.

=head1 CASE-SENSITIVITY

L defaults to HTML semantics, that means all tags and attribute
names are lowercased and selectors need to be lowercase as well.

# HTML semantics
my $dom = Mojo::DOM58->new('

Hi!

');
say $dom->at('p[id]')->text;

If an XML declaration is found, the parser will automatically switch into XML
mode and everything becomes case-sensitive.

# XML semantics
my $dom = Mojo::DOM58->new('

Hi!

');
say $dom->at('P[ID]')->text;

HTML or XML semantics can also be forced with the L"xml"> method.

# Force HTML semantics
my $dom = Mojo::DOM58->new->xml(0)->parse('

Hi!

');
say $dom->at('p[id]')->text;

# Force XML semantics
my $dom = Mojo::DOM58->new->xml(1)->parse('

Hi!

');
say $dom->at('P[ID]')->text;

=head1 SELECTORS

L uses a CSS selector engine based on L. All CSS
selectors that make sense for a standalone parser are supported.

=over

=item Z<>*

Any element.

my $all = $dom->find('*');

=item E

An element of type C.

my $title = $dom->at('title');

=item E[foo]

An C element with a C attribute.

my $links = $dom->find('a[href]');

=item E[foo="bar"]

An C element whose C attribute value is exactly equal to C.

my $case_sensitive = $dom->find('input[type="hidden"]');
my $case_sensitive = $dom->find('input[type=hidden]');

=item E[foo="bar" i]

An C element whose C attribute value is exactly equal to any
(ASCII-range) case-permutation of C. Note that this selector is
B and might change without warning!

my $case_insensitive = $dom->find('input[type="hidden" i]');
my $case_insensitive = $dom->find('input[type=hidden i]');
my $case_insensitive = $dom->find('input[class~="foo" i]');

This selector is part of
L, which is still a work
in progress.

=item E[foo="bar" s]

An C element whose C attribute value is exactly and case-sensitively
equal to C. Note that this selector is B and might change
without warning!

my $case_sensitive = $dom->find('input[type="hidden" s]');

This selector is part of
L, which is still a work
in progress.

=item E[foo~="bar"]

An C element whose C attribute value is a list of whitespace-separated
values, one of which is exactly equal to C.

my $foo = $dom->find('input[class~="foo"]');
my $foo = $dom->find('input[class~=foo]');

=item E[foo^="bar"]

An C element whose C attribute value begins exactly with the string
C.

my $begins_with = $dom->find('input[name^="f"]');
my $begins_with = $dom->find('input[name^=f]');

=item E[foo$="bar"]

An C element whose C attribute value ends exactly with the string
C.

my $ends_with = $dom->find('input[name$="o"]');
my $ends_with = $dom->find('input[name$=o]');

=item E[foo*="bar"]

An C element whose C attribute value contains the substring C.

my $contains = $dom->find('input[name*="fo"]');
my $contains = $dom->find('input[name*=fo]');

=item E[foo|="en"]

An C element whose C attribute has a hyphen-separated list of values
beginning (from the left) with C.

my $english = $dom->find('link[hreflang|=en]');

=item E:root

An C element, root of the document.

my $root = $dom->at(':root');

=item E:nth-child(n)

An C element, the C child of its parent.

my $third = $dom->find('div:nth-child(3)');
my $odd = $dom->find('div:nth-child(odd)');
my $even = $dom->find('div:nth-child(even)');
my $top3 = $dom->find('div:nth-child(-n+3)');

=item E:nth-last-child(n)

An C element, the C child of its parent, counting from the last one.

my $third = $dom->find('div:nth-last-child(3)');
my $odd = $dom->find('div:nth-last-child(odd)');
my $even = $dom->find('div:nth-last-child(even)');
my $bottom3 = $dom->find('div:nth-last-child(-n+3)');

=item E:nth-of-type(n)

An C element, the C sibling of its type.

my $third = $dom->find('div:nth-of-type(3)');
my $odd = $dom->find('div:nth-of-type(odd)');
my $even = $dom->find('div:nth-of-type(even)');
my $top3 = $dom->find('div:nth-of-type(-n+3)');

=item E:nth-last-of-type(n)

An C element, the C sibling of its type, counting from the last one.

my $third = $dom->find('div:nth-last-of-type(3)');
my $odd = $dom->find('div:nth-last-of-type(odd)');
my $even = $dom->find('div:nth-last-of-type(even)');
my $bottom3 = $dom->find('div:nth-last-of-type(-n+3)');

=item E:first-child

An C element, first child of its parent.

my $first = $dom->find('div p:first-child');

=item E:last-child

An C element, last child of its parent.

my $last = $dom->find('div p:last-child');

=item E:first-of-type

An C element, first sibling of its type.

my $first = $dom->find('div p:first-of-type');

=item E:last-of-type

An C element, last sibling of its type.

my $last = $dom->find('div p:last-of-type');

=item E:only-child

An C element, only child of its parent.

my $lonely = $dom->find('div p:only-child');

=item E:only-of-type

An C element, only sibling of its type.

my $lonely = $dom->find('div p:only-of-type');

=item E:empty

An C element that has no children (including text nodes).

my $empty = $dom->find(':empty');

=item E:any-link

Alias for L"E:link">. Note that this selector is B and might
change without warning! This selector is part of
L, which is still a
work in progress.

=item E:link

An C element being the source anchor of a hyperlink of which the target is
not yet visited (C<:link>) or already visited (C<:visited>). Note that
L is not stateful, therefore C<:any-link>, C<:link> and
C<:visited> yield exactly the same results.

my $links = $dom->find(':any-link');
my $links = $dom->find(':link');
my $links = $dom->find(':visited');

=item E:visited

Alias for L"E:link">.

=item E:scope

An C element being a designated reference element. Note that this selector is B and might change
without warning!

my $scoped = $dom->find('a:not(:scope > a)');
my $scoped = $dom->find('div :scope p');
my $scoped = $dom->find('~ p');

This selector is part of L, which is still a work in progress.

=item E:checked

A user interface element C which is checked (for instance a radio-button or
checkbox).

my $input = $dom->find(':checked');

=item E.warning

An C element whose class is "warning".

my $warning = $dom->find('div.warning');

=item E#myid

An C element with C equal to "myid".

my $foo = $dom->at('div#foo');

=item E:not(s1, s2)

An C element that does not match either compound selector C or compound
selector C. Note that support for compound selectors is B and
might change without warning!

my $others = $dom->find('div p:not(:first-child, :last-child)');

Support for compound selectors was added as part of
L, which is still a work
in progress.

=item E:is(s1, s2)

An C element that matches compound selector C and/or compound selector
C. Note that this selector is B and might change without warning!

my $headers = $dom->find(':is(section, article, aside, nav) h1');

This selector is part of
L, which is still a work
in progress.

=item E:has(rs1, rs2)

An C element, if either of the relative selectors C or C, when evaluated with C as the :scope elements,
match an element. Note that this selector is B and might change without warning!

my $link = $dom->find('a:has(> img)');

This selector is part of L, which is still a work in progress.
Also be aware that this feature is currently marked C, so there is a high chance that it will get removed
completely.

=item A|E

An C element that belongs to the namespace alias C from
L.
Key/value pairs passed to selector methods are used to declare namespace
aliases.

my $elem = $dom->find('lq|elem', lq => 'http://example.com/q-markup');

Using an empty alias searches for an element that belongs to no namespace.

my $div = $dom->find('|div');

=item E F

An C element descendant of an C element.

my $headlines = $dom->find('div h1');

=item E E F

An C element child of an C element.

my $headlines = $dom->find('html > body > div > h1');

=item E + F

An C element immediately preceded by an C element.

my $second = $dom->find('h1 + h2');

=item E ~ F

An C element preceded by an C element.

my $second = $dom->find('h1 ~ h2');

=item E, F, G

Elements of type C, C and C.

my $headlines = $dom->find('h1, h2, h3');

=item E[foo=bar][bar=baz]

An C element whose attributes match all following attribute selectors.

my $links = $dom->find('a[foo^=b][foo$=ar]');

=back

=head1 OPERATORS

L overloads the following operators.

=head2 array

my @nodes = @$dom;

Alias for L"child_nodes">.

# ""
$dom->parse('123')->[0];

=head2 bool

my $bool = !!$dom;

Always true.

=head2 hash

my %attrs = %$dom;

Alias for L"attr">.

# "test"
$dom->parse('

Test
')->at('div')->{id};

=head2 stringify

my $str = "$dom";

Alias for L"to_string">.

=head1 FUNCTIONS

L implements the following functions, which can be imported
individually.

=head2 tag_to_html

my $str = tag_to_html 'div', id => 'foo', 'safe content';

Generate HTML/XML tag and render it right away. This is a significantly faster
alternative to L"new_tag"> for template systems that have to generate a lot
of tags.

=head1 METHODS

L implements the following methods.

=head2 new

my $dom = Mojo::DOM58->new;
my $dom = Mojo::DOM58->new('I ♥ Mojo::DOM58!');

Construct a new scalar-based L object and L"parse"> HTML/XML
fragment if necessary.

=head2 new_tag

my $tag = Mojo::DOM58->new_tag('div');
my $tag = $dom->new_tag('div');
my $tag = $dom->new_tag('div', id => 'foo', hidden => undef);
my $tag = $dom->new_tag('div', 'safe content');
my $tag = $dom->new_tag('div', id => 'foo', 'safe content');
my $tag = $dom->new_tag('div', data => {mojo => 'rocks'}, 'safe content');
my $tag = $dom->new_tag('div', id => 'foo', sub { 'unsafe content' });

Construct a new L object for an HTML/XML tag with or without
attributes and content. The C attribute may contain a hash reference with
key/value pairs to generate attributes from.

# "
"
$dom->new_tag('br');

# "

"
$dom->new_tag('div');

# "

"
$dom->new_tag('div', id => 'foo', hidden => undef);

# "

test & 123
"
$dom->new_tag('div', 'test & 123');

# "

test & 123
"
$dom->new_tag('div', id => 'foo', 'test & 123');

# "

test & 123
""
$dom->new_tag('div', data => {foo => 1, Bar => 'test'}, 'test & 123');

# "

test & 123
"
$dom->new_tag('div', id => 'foo', sub { 'test & 123' });

# "

HelloMojo!
"
$dom->parse('
Hello
')->at('div')
->append_content($dom->new_tag('b', 'Mojo!'))->root;

=head2 all_text

my $text = $dom->all_text;

Extract text content from all descendant nodes of this element. For HTML documents C and C<style> elements are
excluded.

# "foo\nbarbaz\n"
$dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->at('div')->all_text;

=head2 ancestors

my $collection = $dom->ancestors;
my $collection = $dom->ancestors('div ~ p');

Find all ancestor elements of this node matching the CSS selector and return a
L<collection|/"COLLECTION METHODS"> containing these elements as L<Mojo::DOM58>
objects. All selectors listed in L</"SELECTORS"> are supported.

# List tag names of ancestor elements
say $dom->ancestors->map('tag')->join("\n");

=head2 append

$dom = $dom->append('<p>I ♥ Mojo::DOM58!</p>');
$dom = $dom->append(Mojo::DOM58->new);

Append HTML/XML fragment to this node (for all node types other than C<root>).

# "<div><h1>Test</h1><h2>123</h2></div>"
$dom->parse('<div><h1>Test</h1></div>')
->at('h1')->append('<h2>123</h2>')->root;

# "<p>Test 123</p>"
$dom->parse('<p>Test</p>')->at('p')
->child_nodes->first->append(' 123')->root;

=head2 append_content

$dom = $dom->append_content('<p>I ♥ Mojo::DOM58!</p>');
$dom = $dom->append_content(Mojo::DOM58->new);

Append HTML/XML fragment (for C<root> and C<tag> nodes) or raw content to this
node's content.

# "<div><h1>Test123</h1></div>"
$dom->parse('<div><h1>Test</h1></div>')
->at('h1')->append_content('123')->root;

# "<!-- Test 123 --><br>"
$dom->parse('<!-- Test --><br>')
->child_nodes->first->append_content('123 ')->root;

# "<p>Test<i>123</i></p>"
$dom->parse('<p>Test</p>')->at('p')->append_content('<i>123</i>')->root;

=head2 at

my $result = $dom->at('div ~ p');
my $result = $dom->at('svg|line', svg => 'http://www.w3.org/2000/svg');

Find first descendant element of this element matching the CSS selector and
return it as a L<Mojo::DOM58> object, or C<undef> if none could be found. All
selectors listed in L</"SELECTORS"> are supported.

# Find first element with "svg" namespace definition
my $namespace = $dom->at('[xmlns\:svg]')->{'xmlns:svg'};

Trailing key/value pairs can be used to declare xml namespace aliases.

# "<rect />"
$dom->parse('<svg xmlns="http://www.w3.org/2000/svg"><rect /></svg>')
->at('svg|rect', svg => 'http://www.w3.org/2000/svg');

=head2 attr

my $hash = $dom->attr;
my $foo = $dom->attr('foo');
$dom = $dom->attr({foo => 'bar'});
$dom = $dom->attr(foo => 'bar');

This element's attributes.

# Remove an attribute
delete $dom->attr->{id};

# Attribute without value
$dom->attr(selected => undef);

# List id attributes
say $dom->find('*')->map(attr => 'id')->compact->join("\n");

=head2 child_nodes

my $collection = $dom->child_nodes;

Return a L<collection|/"COLLECTION METHODS"> containing all child nodes of this
element as L<Mojo::DOM58> objects.

# "<p><b>123</b></p>"
$dom->parse('<p>Test<b>123</b></p>')->at('p')->child_nodes->first->remove;

# "<!DOCTYPE html>"
$dom->parse('<!DOCTYPE html><b>123</b>')->child_nodes->first;

# " Test "
$dom->parse('<b>123</b><!-- Test -->')->child_nodes->last->content;

=head2 children

my $collection = $dom->children;
my $collection = $dom->children('div ~ p');

Find all child elements of this element matching the CSS selector and return a
L<collection|/"COLLECTION METHODS"> containing these elements as L<Mojo::DOM58>
objects. All selectors listed in L</"SELECTORS"> are supported.

# Show tag name of random child element
say $dom->children->shuffle->first->tag;

=head2 content

my $str = $dom->content;
$dom = $dom->content('<p>I ♥ Mojo::DOM58!</p>');
$dom = $dom->content(Mojo::DOM58->new);

Return this node's content or replace it with HTML/XML fragment (for C<root>
and C<tag> nodes) or raw content.

# "<b>Test</b>"
$dom->parse('<div><b>Test</b></div>')->at('div')->content;

# "<div><h1>123</h1></div>"
$dom->parse('<div><h1>Test</h1></div>')->at('h1')->content('123')->root;

# "<p><i>123</i></p>"
$dom->parse('<p>Test</p>')->at('p')->content('<i>123</i>')->root;

# "<div><h1></h1></div>"
$dom->parse('<div><h1>Test</h1></div>')->at('h1')->content('')->root;

# " Test "
$dom->parse('<!-- Test --><br>')->child_nodes->first->content;

# "<div><!-- 123 -->456</div>"
$dom->parse('<div><!-- Test -->456</div>')
->at('div')->child_nodes->first->content(' 123 ')->root;

=head2 descendant_nodes

my $collection = $dom->descendant_nodes;

Return a L<collection|/"COLLECTION METHODS"> containing all descendant nodes of
this element as L<Mojo::DOM58> objects.

# "<p><b>123</b></p>"
$dom->parse('<p><!-- Test --><b>123<!-- 456 --></b></p>')
->descendant_nodes->grep(sub { $_->type eq 'comment' })
->map('remove')->first;

# "<p><b>test</b>test</p>"
$dom->parse('<p><b>123</b>456</p>')
->at('p')->descendant_nodes->grep(sub { $_->type eq 'text' })
->map(content => 'test')->first->root;

=head2 find

my $collection = $dom->find('div ~ p');
my $collection = $dom->find('svg|line', svg => 'http://www.w3.org/2000/svg');

Find all descendant elements of this element matching the CSS selector and
return a L<collection|/"COLLECTION METHODS"> containing these elements as
L<Mojo::DOM58> objects. All selectors listed in L</"SELECTORS"> are supported.

# Find a specific element and extract information
my $id = $dom->find('div')->[23]{id};

# Extract information from multiple elements
my @headers = $dom->find('h1, h2, h3')->map('text')->each;

# Count all the different tags
my $hash = $dom->find('*')->reduce(sub { $a->{$b->tag}++; $a }, {});

# Find elements with a class that contains dots
my @divs = $dom->find('div.foo\.bar')->each;

Trailing key/value pairs can be used to declare xml namespace aliases.

# "<rect />"
$dom->parse('<svg xmlns="http://www.w3.org/2000/svg"><rect /></svg>')
->find('svg|rect', svg => 'http://www.w3.org/2000/svg')->first;

=head2 following

my $collection = $dom->following;
my $collection = $dom->following('div ~ p');

Find all sibling elements after this node matching the CSS selector and return
a L<collection|/"COLLECTION METHODS"> containing these elements as
L<Mojo::DOM58> objects. All selectors listed in L</"SELECTORS"> are supported.

# List tags of sibling elements after this node
say $dom->following->map('tag')->join("\n");

=head2 following_nodes

my $collection = $dom->following_nodes;

Return a L<collection|/"COLLECTION METHODS"> containing all sibling nodes after
this node as L<Mojo::DOM58> objects.

# "C"
$dom->parse('<p>A</p><!-- B -->C')->at('p')->following_nodes->last->content;

=head2 matches

my $bool = $dom->matches('div ~ p');
my $bool = $dom->matches('svg|line', svg => 'http://www.w3.org/2000/svg');

Check if this element matches the CSS selector. All selectors listed in
L</"SELECTORS"> are supported.

# True
$dom->parse('<p class="a">A</p>')->at('p')->matches('.a');
$dom->parse('<p class="a">A</p>')->at('p')->matches('p[class]');

# False
$dom->parse('<p class="a">A</p>')->at('p')->matches('.b');
$dom->parse('<p class="a">A</p>')->at('p')->matches('p[id]');

Trailing key/value pairs can be used to declare xml namespace aliases.

# True
$dom->parse('<svg xmlns="http://www.w3.org/2000/svg"><rect /></svg>')
->matches('svg|rect', svg => 'http://www.w3.org/2000/svg');

=head2 namespace

my $namespace = $dom->namespace;

Find this element's namespace, or return C<undef> if none could be found.

# "http://www.w3.org/2000/svg"
Mojo::DOM58->new('<svg xmlns:svg="http://www.w3.org/2000/svg"><svg:circle>3.14</svg:circle></svg>')->at('svg\:circle')->namespace;

# Find namespace for an element with namespace prefix
my $namespace = $dom->at('svg > svg\:circle')->namespace;

# Find namespace for an element that may or may not have a namespace prefix
my $namespace = $dom->at('svg > circle')->namespace;

=head2 next

my $sibling = $dom->next;

Return L<Mojo::DOM58> object for next sibling element, or C<undef> if there are
no more siblings.

# "<h2>123</h2>"
$dom->parse('<div><h1>Test</h1><h2>123</h2></div>')->at('h1')->next;

=head2 next_node

my $sibling = $dom->next_node;

Return L<Mojo::DOM58> object for next sibling node, or C<undef> if there are no
more siblings.

# "456"
$dom->parse('<p><b>123</b><!-- Test -->456</p>')
->at('b')->next_node->next_node;

# " Test "
$dom->parse('<p><b>123</b><!-- Test -->456</p>')
->at('b')->next_node->content;

=head2 parent

my $parent = $dom->parent;

Return L<Mojo::DOM58> object for parent of this node, or C<undef> if this node
has no parent.

# "<b><i>Test</i></b>"
$dom->parse('<p><b><i>Test</i></b></p>')->at('i')->parent;

=head2 parse

$dom = $dom->parse('<foo bar="baz">I ♥ Mojo::DOM58!</foo>');

Parse HTML/XML fragment.

# Parse XML
my $dom = Mojo::DOM58->new->xml(1)->parse('<foo>I ♥ Mojo::DOM58!</foo>');

=head2 preceding

my $collection = $dom->preceding;
my $collection = $dom->preceding('div ~ p');

Find all sibling elements before this node matching the CSS selector and return
a L<collection|/"COLLECTION METHODS"> containing these elements as
L<Mojo::DOM58> objects. All selectors listed in L</"SELECTORS"> are supported.

# List tags of sibling elements before this node
say $dom->preceding->map('tag')->join("\n");

=head2 preceding_nodes

my $collection = $dom->preceding_nodes;

Return a L<collection|/"COLLECTION METHODS"> containing all sibling nodes
before this node as L<Mojo::DOM58> objects.

# "A"
$dom->parse('A<!-- B --><p>C</p>')->at('p')->preceding_nodes->first->content;

=head2 prepend

$dom = $dom->prepend('<p>I ♥ Mojo::DOM58!</p>');
$dom = $dom->prepend(Mojo::DOM58->new);

Prepend HTML/XML fragment to this node (for all node types other than C<root>).

# "<div><h1>Test</h1><h2>123</h2></div>"
$dom->parse('<div><h2>123</h2></div>')
->at('h2')->prepend('<h1>Test</h1>')->root;

# "<p>Test 123</p>"
$dom->parse('<p>123</p>')
->at('p')->child_nodes->first->prepend('Test ')->root;

=head2 prepend_content

$dom = $dom->prepend_content('<p>I ♥ Mojo::DOM58!</p>');
$dom = $dom->prepend_content(Mojo::DOM58->new);

Prepend HTML/XML fragment (for C<root> and C<tag> nodes) or raw content to this
node's content.

# "<div><h2>Test123</h2></div>"
$dom->parse('<div><h2>123</h2></div>')
->at('h2')->prepend_content('Test')->root;

# "<!-- Test 123 --><br>"
$dom->parse('<!-- 123 --><br>')
->child_nodes->first->prepend_content(' Test')->root;

# "<p><i>123</i>Test</p>"
$dom->parse('<p>Test</p>')->at('p')->prepend_content('<i>123</i>')->root;

=head2 previous

my $sibling = $dom->previous;

Return L<Mojo::DOM58> object for previous sibling element, or C<undef> if there
are no more siblings.

# "<h1>Test</h1>"
$dom->parse('<div><h1>Test</h1><h2>123</h2></div>')->at('h2')->previous;

=head2 previous_node

my $sibling = $dom->previous_node;

Return L<Mojo::DOM58> object for previous sibling node, or C<undef> if there are
no more siblings.

# "123"
$dom->parse('<p>123<!-- Test --><b>456</b></p>')
->at('b')->previous_node->previous_node;

# " Test "
$dom->parse('<p>123<!-- Test --><b>456</b></p>')
->at('b')->previous_node->content;

=head2 remove

my $parent = $dom->remove;

Remove this node and return L</"root"> (for C<root> nodes) or L</"parent">.

# "<div></div>"
$dom->parse('<div><h1>Test</h1></div>')->at('h1')->remove;

# "<p><b>456</b></p>"
$dom->parse('<p>123<b>456</b></p>')
->at('p')->child_nodes->first->remove->root;

=head2 replace

my $parent = $dom->replace('<div>I ♥ Mojo::DOM58!</div>');
my $parent = $dom->replace(Mojo::DOM58->new);

Replace this node with HTML/XML fragment and return L</"root"> (for C<root>
nodes) or L</"parent">.

# "<div><h2>123</h2></div>"
$dom->parse('<div><h1>Test</h1></div>')->at('h1')->replace('<h2>123</h2>');

# "<p><b>123</b></p>"
$dom->parse('<p>Test</p>')
->at('p')->child_nodes->[0]->replace('<b>123</b>')->root;

=head2 root

my $root = $dom->root;

Return L<Mojo::DOM58> object for C<root> node.

=head2 selector

my $selector = $dom->selector;

Get a unique CSS selector for this element.

# "ul:nth-child(1) > li:nth-child(2)"
$dom->parse('<ul><li>Test</li><li>123</li></ul>')->find('li')->last->selector;

# "p:nth-child(1) > b:nth-child(1) > i:nth-child(1)"
$dom->parse('<p><b><i>Test</i></b></p>')->at('i')->selector;

=head2 strip

my $parent = $dom->strip;

Remove this element while preserving its content and return L</"parent">.

# "<div>Test</div>"
$dom->parse('<div><h1>Test</h1></div>')->at('h1')->strip;

=head2 tag

my $tag = $dom->tag;
$dom = $dom->tag('div');

This element's tag name.

# List tag names of child elements
say $dom->children->map('tag')->join("\n");

=head2 tap

$dom = $dom->tap(sub {...});

Equivalent to L<Mojo::Base/"tap">.

=head2 text

my $text = $dom->text;

Extract text content from this element only (not including child elements).

# "bar"
$dom->parse("<div>foo<p>bar</p>baz</div>")->at('p')->text;

# "foo\nbaz\n"
$dom->parse("<div>foo\n<p>bar</p>baz\n</div>")->at('div')->text;

=head2 to_string

my $str = $dom->to_string;

Render this node and its content to HTML/XML.

# "<b>Test</b>"
$dom->parse('<div><b>Test</b></div>')->at('div b')->to_string;

To extract text content from all descendant nodes, see L</"all_text">.

=head2 tree

my $tree = $dom->tree;
$dom = $dom->tree(['root']);

Document Object Model. Note that this structure should only be used very
carefully since it is very dynamic.

=head2 type

my $type = $dom->type;

This node's type, usually C<cdata>, C<comment>, C<doctype>, C<pi>, C<raw>,
C<root>, C<tag> or C<text>.

# "cdata"
$dom->parse('<![CDATA[Test]]>')->child_nodes->first->type;

# "comment"
$dom->parse('<!-- Test -->')->child_nodes->first->type;

# "doctype"
$dom->parse('<!DOCTYPE html>')->child_nodes->first->type;

# "pi"
$dom->parse('<?xml version="1.0"?>')->child_nodes->first->type;

# "raw"
$dom->parse('<title>Test</title>')->at('title')->child_nodes->first->type;

# "root"
$dom->parse('<p>Test</p>')->type;

# "tag"
$dom->parse('<p>Test</p>')->at('p')->type;

# "text"
$dom->parse('<p>Test</p>')->at('p')->child_nodes->first->type;

=head2 val

my $value = $dom->val;

Extract value from form element (such as C<button>, C<input>, C<option>,
C<select> and C<textarea>), or return C<undef> if this element has no value. In
the case of C<select> with C<multiple> attribute, find C<option> elements with
C<selected> attribute and return an array reference with all values, or
C<undef> if none could be found.

# "a"
$dom->parse('<input name=test value=a>')->at('input')->val;

# "b"
$dom->parse('<textarea>b</textarea>')->at('textarea')->val;

# "c"
$dom->parse('<option value="c">Test</option>')->at('option')->val;

# "d"
$dom->parse('<select><option selected>d</option></select>')
->at('select')->val;

# "e"
$dom->parse('<select multiple><option selected>e</option></select>')
->at('select')->val->[0];

# "on"
$dom->parse('<input name=test type=checkbox>')->at('input')->val;

=head2 with_roles

my $new_class = Mojo::DOM58->with_roles('Mojo::DOM58::Role::One');
my $new_class = Mojo::DOM58->with_roles('+One', '+Two');
$dom = $dom->with_roles('+One', '+Two');

Equivalent to L<Mojo::Base/"with_roles">. Note that role support depends on
L<Role::Tiny> (2.000001+).

=head2 wrap

$dom = $dom->wrap('<div></div>');
$dom = $dom->wrap(Mojo::DOM58->new);

Wrap HTML/XML fragment around this node (for all node types other than C<root>),
placing it as the last child of the first innermost element.

# "<p>123<b>Test</b></p>"
$dom->parse('<b>Test</b>')->at('b')->wrap('<p>123</p>')->root;

# "<div><p><b>Test</b></p>123</div>"
$dom->parse('<b>Test</b>')->at('b')->wrap('<div><p></p>123</div>')->root;

# "<p><b>Test</b></p><p>123</p>"
$dom->parse('<b>Test</b>')->at('b')->wrap('<p></p><p>123</p>')->root;

# "<p><b>Test</b></p>"
$dom->parse('<p>Test</p>')->at('p')->child_nodes->first->wrap('<b>')->root;

=head2 wrap_content

$dom = $dom->wrap_content('<div></div>');
$dom = $dom->wrap_content(Mojo::DOM58->new);

Wrap HTML/XML fragment around this node's content (for C<root> and C<tag>
nodes), placing it as the last children of the first innermost element.

# "<p><b>123Test</b></p>"
$dom->parse('<p>Test<p>')->at('p')->wrap_content('<b>123</b>')->root;

# "<p><b>Test</b></p><p>123</p>"
$dom->parse('<b>Test</b>')->wrap_content('<p></p><p>123</p>');

=head2 xml

my $bool = $dom->xml;
$dom = $dom->xml($bool);

Disable HTML semantics in parser and activate case-sensitivity, defaults to
auto detection based on XML declarations.

=head1 COLLECTION METHODS

Some L<Mojo::DOM58> methods return an array-based collection object based on
L<Mojo::Collection>, which can either be accessed directly as an array
reference, or with the following methods.

# Chain methods
$collection->map(sub { ucfirst })->shuffle->each(sub {
my ($word, $num) = @_;
say "$num: $word";
});

# Access array directly to manipulate collection
$collection->[23] += 100;
say for @$collection;

=head2 compact

my $new = $collection->compact;

Create a new L<collection|/"COLLECTION METHODS"> with all elements that are
defined and not an empty string.

# $collection contains (0, 1, undef, 2, '', 3)
$collection->compact->join(', '); # "0, 1, 2, 3"

=head2 each

my @elements = $collection->each;
$collection = $collection->each(sub {...});

Evaluate callback for each element in collection or return all elements as a
list if none has been provided. The element will be the first argument passed
to the callback and is also available as C<$_>.

# Make a numbered list
$collection->each(sub {
my ($e, $num) = @_;
say "$num: $e";
});

=head2 first

my $first = $collection->first;
my $first = $collection->first(qr/foo/);
my $first = $collection->first(sub {...});
my $first = $collection->first($method);
my $first = $collection->first($method, @args);

Evaluate regular expression/callback for, or call method on, each element in
collection and return the first one that matched the regular expression, or for
which the callback/method returned true. The element will be the first argument
passed to the callback and is also available as C<$_>.

# Longer version
my $first = $collection->first(sub { $_->$method(@args) });

# Find first value that contains the word "mojo"
my $interesting = $collection->first(qr/mojo/i);

# Find first value that is greater than 5
my $greater = $collection->first(sub { $_ > 5 });

=head2 flatten

my $new = $collection->flatten;

Flatten nested collections/arrays recursively and create a new
L<collection|/"COLLECTION METHODS"> with all elements.

# $collection contains (1, [2, [3, 4], 5, [6]], 7)
$collection->flatten->join(', '); # "1, 2, 3, 4, 5, 6, 7"

=head2 grep

my $new = $collection->grep(qr/foo/);
my $new = $collection->grep(sub {...});
my $new = $collection->grep($method);
my $new = $collection->grep($method, @args);

Evaluate regular expression/callback for, or call method on, each element in
collection and create a new L<collection|/"COLLECTION METHODS"> with all
elements that matched the regular expression, or for which the callback/method
returned true. The element will be the first argument passed to the callback
and is also available as C<$_>.

# Longer version
my $new = $collection->grep(sub { $_->$method(@args) });

# Find all values that contain the word "mojo"
my $interesting = $collection->grep(qr/mojo/i);

# Find all values that are greater than 5
my $greater = $collection->grep(sub { $_ > 5 });

=head2 head

my $new = $collection->head(4);
my $new = $collection->head(-2);

Create a new L<collection|/"COLLECTION METHODS"> with up to the specified
number of elements from the beginning of the collection. A negative number will
count from the end.

# $collection contains ('A', 'B', 'C', 'D', 'E')
$collection->head(3)->join(' '); # "A B C"
$collection->head(-3)->join(' '); # "A B"

=head2 join

my $stream = $collection->join;
my $stream = $collection->join("\n");

Turn collection into string.

# Join all values with commas
$collection->join(', ');

=head2 last

my $last = $collection->last;

Return the last element in collection.

=head2 map

my $new = $collection->map(sub {...});
my $new = $collection->map($method);
my $new = $collection->map($method, @args);

Evaluate callback for, or call method on, each element in collection and create
a new L<collection|/"COLLECTION METHODS"> from the results. The element will be
the first argument passed to the callback and is also available as C<$_>.

# Longer version
my $new = $collection->map(sub { $_->$method(@args) });

# Append the word "mojo" to all values
my $domified = $collection->map(sub { $_ . 'mojo' });

=head2 reduce

my $result = $collection->reduce(sub {...});
my $result = $collection->reduce(sub {...}, $initial);

Reduce elements in collection with callback, the first element will be used as
initial value if none has been provided.

# Calculate the sum of all values
my $sum = $collection->reduce(sub { $a + $b });

# Count how often each value occurs in collection
my $hash = $collection->reduce(sub { $a->{$b}++; $a }, {});

=head2 reverse

my $new = $collection->reverse;

Create a new L<collection|/"COLLECTION METHODS"> with all elements in reverse
order.

=head2 slice

my $new = $collection->slice(4 .. 7);

Create a new L<collection|/"COLLECTION METHODS"> with all selected elements.

# $collection contains ('A', 'B', 'C', 'D', 'E')
$collection->slice(1, 2, 4)->join(' '); # "B C E"

=head2 shuffle

my $new = $collection->shuffle;

Create a new L<collection|/"COLLECTION METHODS"> with all elements in random
order.

=head2 size

my $size = $collection->size;

Number of elements in collection.

=head2 sort

my $new = $collection->sort;
my $new = $collection->sort(sub {...});

Sort elements based on return value of callback and create a new
L<collection|/"COLLECTION METHODS"> from the results.

# Sort values case-insensitive
my $case_insensitive = $collection->sort(sub { uc($a) cmp uc($b) });

=head2 tail

my $new = $collection->tail(4);
my $new = $collection->tail(-2);

Create a new L<collection|/"COLLECTION METHODS"> with up to the specified
number of elements from the end of the collection. A negative number will count
from the beginning.

# $collection contains ('A', 'B', 'C', 'D', 'E')
$collection->tail(3)->join(' '); # "C D E"
$collection->tail(-3)->join(' '); # "D E"

=head2 tap

$collection = $collection->tap(sub {...});

Equivalent to L<Mojo::Base/"tap">.

=head2 to_array

my $array = $collection->to_array;

Turn collection into array reference.

=head2 uniq

my $new = $collection->uniq;
my $new = $collection->uniq(sub {...});
my $new = $collection->uniq($method);
my $new = $collection->uniq($method, @args);

Create a new L<collection|/"COLLECTION METHODS"> without duplicate elements,
using the string representation of either the elements or the return value of
the callback/method to decide uniqueness. Note that C<undef> and empty string
are treated the same.

# Longer version
my $new = $collection->uniq(sub { $_->$method(@args) });

# $collection contains ('foo', 'bar', 'bar', 'baz')
$collection->uniq->join(' '); # "foo bar baz"

# $collection contains ([1, 2], [2, 1], [3, 2])
$collection->uniq(sub{ $_->[1] })->to_array; # "[[1, 2], [2, 1]]"

=head2 with_roles

$collection = $collection->with_roles('Mojo::Collection::Role::One');

Equivalent to L<Mojo::Base/"with_roles">. Note that role support depends on
L<Role::Tiny> (2.000001+).

=head1 DEBUGGING

You can set the C<MOJO_DOM58_CSS_DEBUG> environment variable to get some advanced diagnostics information printed to
C<STDERR>.

MOJO_DOM58_CSS_DEBUG=1

=head1 BUGS

Report issues related to the format of this distribution or Perl 5.8 support to
the public bugtracker. Any other issues should be reported directly to the
upstream L<Mojolicious> issue tracker.

=head1 AUTHOR

Dan Book <[email protected]>

Code and tests adapted from L<Mojo::DOM>, a lightweight DOM parser by the L<Mojolicious> team.

=head1 CONTRIBUTORS

=over

=item Matt S Trout (mst)

=back

=head1 COPYRIGHT AND LICENSE

Copyright (c) 2008-2016 Sebastian Riedel and others.

Copyright (c) 2016 L</"AUTHOR"> and L</"CONTRIBUTORS"> for adaptation to standalone format.

This is free software, licensed under:

The Artistic License 2.0 (GPL Compatible)

=head1 SEE ALSO

L<Mojo::DOM>, L<HTML::TreeBuilder>, L<XML::LibXML>, L<XML::Twig>, L<XML::Smart>

=for Pod::Coverage TO_JSON

=cut