PROXY  WHOIS  RQUOTE  TEXTS  SOFT  FOREX  BBOARD
 Music  Philosophy  Code  Literature  Russian

= ROOT|Technical|Code_Examples|Perl|site_perl|HTML|TreeBuilder.pm =

page 19 of 19



different interface than you'd find in a combined tokenizer and
tree-builder.  But most of the length of the source comes from the fact
that it's essentially a long list of special cases, with lots and lots
of sanity-checking, and sanity-recovery -- because, as Roseanne
Rosannadanna once said, "it's always I<something>".

Users looking to compare several HTML parsers should look at the
source for Raggett's Tidy
(C<E<lt>http://www.w3.org/People/Raggett/tidy/E<gt>>),
Mozilla
(C<E<lt>http://www.mozilla.org/E<gt>>),
and possibly root around the browsers section of Yahoo
to find the various open-source ones
(C<E<lt>http://dir.yahoo.com/Computers_and_Internet/Software/Internet/World_Wide_Web/Browsers/E<gt>>).

=head1 BUGS

* Framesets seem to work correctly now.  Email me if you get a strange
parse from a document with framesets.

* Really bad HTML code will, often as not, make for a somewhat
objectionable parse tree.  Regrettable, but unavoidably true.

* If you're running with implicit_tags off (God help you!), consider
that $tree->content_list probably contains the tree or grove from the
parse, and not $tree itself (which will, oddly enough, be an implicit
'html' element).  This seems counter-intuitive and problematic; but
seeing as how almost no HTML ever parses correctly with implicit_tags
off, this interface oddity seems the least of your problems.

=head1 BUG REPORTS

When a document parses in a way different from how you think it
should, I ask that you report this to me as a bug.  The first thing
you should do is copy the document, trim out as much of it as you can
while still producing the bug in question, and I<then> email me that
mini-document I<and> the code you're using to parse it, to the HTML::Tree
bug queue at C<bug-html-tree at rt.cpan.org>.

Include a note as to how it 
parses (presumably including its $tree->dump output), and then a
I<careful and clear> explanation of where you think the parser is
going astray, and how you would prefer that it work instead.

=head1 SEE ALSO

L<HTML::Tree>; L<HTML::Parser>, L<HTML::Element>, L<HTML::Tagset>

L<HTML::DOMbo>

=head1 COPYRIGHT

Copyright 1995-1998 Gisle Aas, 1999-2004 Sean M. Burke, 2005 Andy Lester,
2006 Pete Krawczyk.

This library is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.

This program is distributed in the hope that it will be useful, but
without any warranty; without even the implied warranty of
merchantability or fitness for a particular purpose.

=head1 AUTHOR

Currently maintained by Pete Krawczyk C<< <petek@cpan.org> >>

Original authors: Gisle Aas, Sean Burke and Andy Lester.

=cut
=19=
THE END

1.13|14|15|16|17|18| < PREV = PAGE 19 =

UP TO ROOT | UP TO DIR | TO FIRST PAGE

Google
 

E-mail Facebook Google Digg del.icio.us BlinkList Fark Furl Ma.gnolia Netscape NewsVine Reddit Slashdot Spurl StumbleUpon Technorati YahooMyWeb LiveJournal Blogmarks TwitThis Live News2.ru BobrDobr.ru Memori.ru MoeMesto.ru

0.00839686 wallclock secs ( 0.00 usr + 0.01 sys = 0.01 CPU)