%isHeadOrBodyElement = map {; $_ => 1 }
qw(script isindex style object map area param noscript bgsound);
# i.e., if we find 'script' in the 'body' or the 'head', don't freak out.
=head2 hashset %HTML::Tagset::isKnown
This hashset lists all known HTML elements.
=cut
%isKnown = (%isHeadElement, %isBodyElement,
map{; $_=>1 }
qw( head body html
frame frameset noframes
~comment ~pi ~directive ~literal
));
# that should be all known tags ever ever
=head2 hashset %HTML::Tagset::canTighten
This hashset lists elements that might have ignorable whitespace as
children or siblings.
=cut
%canTighten = %isKnown;
delete @canTighten{
keys(%isPhraseMarkup), 'input', 'select',
'xmp', 'listing', 'plaintext', 'pre',
};
# xmp, listing, plaintext, and pre are untightenable, and
# in a really special way.
@canTighten{'hr','br'} = (1,1);
# exceptional 'phrasal' things that ARE subject to tightening.
# The one case where I can think of my tightening rules failing is:
# <p>foo bar<center> <em>baz quux</em> ...
# ^-- that would get deleted.
# But that's pretty gruesome code anyhow. You gets what you pays for.
#==========================================================================
=head2 array @HTML::Tagset::p_closure_barriers
This array has a meaning that I have only seen a need for in
C<HTML::TreeBuilder>, but I include it here on the off chance that someone
might find it of use:
When we see a "E<lt>pE<gt>" token, we go lookup up the lineage for a p
element we might have to minimize. At first sight, we might say that
if there's a p anywhere in the lineage of this new p, it should be
closed. But that's wrong. Consider this document:
<html>
<head>
<title>foo</title>
</head>
<body>
<p>foo
<table>
<tr>
<td>
foo
<p>bar
</td>
</tr>
</table>
</p>
</body>
</html>
The second p is quite legally inside a much higher p.
My formalization of the reason why this is legal, but this:
<p>foo<p>bar</p></p>
isn't, is that something about the table constitutes a "barrier" to
the application of the rule about what p must minimize.
So C<@HTML::Tagset::p_closure_barriers> is the list of all such
barrier-tags.
=cut
@p_closure_barriers = qw(
li blockquote
ul ol menu dir
dl dt dd
td th tr table caption
div
);
# In an ideal world (i.e., XHTML) we wouldn't have to bother with any of this
# monkey business of barriers to minimization!
=head2 hashset %isCDATA_Parent
=4= |