matches($path)
use HTML::TreeBuilder::XPath; my $tree= HTML::TreeBuilder::XPath->new; $tree->parse_file( "mypage.html"); my $nb=$tree->findvalue( '/html/body//p[@class="section_title"]/span[@class="nb"]'); my $id=$tree->findvalue( '/html/body//p[@class="section_title"]/@id');
my $p= $html->findnodes( '//p[@id="toto"]')->[0]; my $link_texts= $p->findvalue( './a'); # the texts of all a elements in $p =head1 DESCRIPTION
This module adds typical XPath methods to HTML::TreeBuilder, to make it easy to query a document.
Extra methods added both to the tree object and to each element:
Returns a list of nodes found by $path.
In scalar context returns an Tree::XPathEngine::NodeSet object.
Returns the text values of the nodes
Returns either a Tree::XPathEngine::Literal, a Tree::XPathEngine::Boolean
or a Tree::XPathEngine::Number object. If the path returns a NodeSet,
$nodeset->xpath_to_literal is called automatically for you (and thus a
Tree::XPathEngine::Literal is returned). Note that
for each of the objects stringification is overloaded, so you can just
print the value found, or manipulate it in the ways you would a normal
perl value (e.g. using regular expressions).
Returns true if the given path exists.
matches($path)Returns true if the element matches the path.
The find function takes an XPath expression (a string) and returns either a
Tree::XPathEngine::NodeSet object containing the nodes it found (or empty if
no nodes matched the path), or one of XML::XPathEngine::Literal (a string),
XML::XPathEngine::Number, or XML::XPathEngine::Boolean. It should always
return something - and you can use ->isa() to find out what it returned. If
you need to check how many nodes it found you should check $nodeset->size.
See the XML::XPathEngine::NodeSet manpage.
HTML::TreeBuilder's as_XML output is not really nice to look at, so
I added a new method, that can be used as a simple replacement for it.
It escapes only the '<', '>' and '&' (plus '``' in attribute values), and
wraps CDATA elements in CDATA sections.
The $optional_indent_level defaults to the level in the original HTML
document (ie you probably don't have to use it)
This method is currently in alpha state. Ping me if you want other options added to it (wrapping?).
Michel Rodriguez, <mirod@cpan.org>
Copyright (C) 2006 by Michel Rodriguez
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available.