different interface than you'd find in a combined tokenizer and
tree-builder. But most of the length of the source comes from the fact
that it's essentially a long list of special cases, with lots and lots
of sanity-checking, and sanity-recovery -- because, as Roseanne
Rosannadanna once said, "it's always I<something>".
Users looking to compare several HTML parsers should look at the
source for Raggett's Tidy
(C<E<lt>http://www.w3.org/People/Raggett/tidy/E<gt>>),
Mozilla
(C<E<lt>http://www.mozilla.org/E<gt>>),
and possibly root around the browsers section of Yahoo
to find the various open-source ones
(C<E<lt>http://dir.yahoo.com/Computers_and_Internet/Software/Internet/World_Wide_Web/Browsers/E<gt>>).
=head1 BUGS
* Framesets seem to work correctly now. Email me if you get a strange
parse from a document with framesets.
* Really bad HTML code will, often as not, make for a somewhat
objectionable parse tree. Regrettable, but unavoidably true.
* If you're running with implicit_tags off (God help you!), consider
that $tree->content_list probably contains the tree or grove from the
parse, and not $tree itself (which will, oddly enough, be an implicit
'html' element). This seems counter-intuitive and problematic; but
seeing as how almost no HTML ever parses correctly with implicit_tags
off, this interface oddity seems the least of your problems.
=head1 BUG REPORTS
When a document parses in a way different from how you think it
should, I ask that you report this to me as a bug. The first thing
you should do is copy the document, trim out as much of it as you can
while still producing the bug in question, and I<then> email me that
mini-document I<and> the code you're using to parse it, to the HTML::Tree
bug queue at C<bug-html-tree at rt.cpan.org>.
Include a note as to how it
parses (presumably including its $tree->dump output), and then a
I<careful and clear> explanation of where you think the parser is
going astray, and how you would prefer that it work instead.
=head1 SEE ALSO
L<HTML::Tree>; L<HTML::Parser>, L<HTML::Element>, L<HTML::Tagset>
L<HTML::DOMbo>
=head1 COPYRIGHT
Copyright 1995-1998 Gisle Aas, 1999-2004 Sean M. Burke, 2005 Andy Lester,
2006 Pete Krawczyk.
This library is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.
This program is distributed in the hope that it will be useful, but
without any warranty; without even the implied warranty of
merchantability or fitness for a particular purpose.
=head1 AUTHOR
Currently maintained by Pete Krawczyk C<< <petek@cpan.org> >>
Original authors: Gisle Aas, Sean Burke and Andy Lester.
=cut
=19=
THE END |