PROXY  WHOIS  RQUOTE  TEXTS  SOFT  FOREX  BBOARD
 Music  Philosophy  Code  Literature  Russian

= ROOT|Technical|Code_Examples|Perl|site_perl|HTML|Tagset.pm =

page 4 of 5




%isHeadOrBodyElement = map {; $_ => 1 }
  qw(script isindex style object map area param noscript bgsound);
  # i.e., if we find 'script' in the 'body' or the 'head', don't freak out.


=head2 hashset %HTML::Tagset::isKnown

This hashset lists all known HTML elements.

=cut

%isKnown = (%isHeadElement, %isBodyElement,
  map{; $_=>1 }
   qw( head body html
       frame frameset noframes
       ~comment ~pi ~directive ~literal
));
 # that should be all known tags ever ever


=head2 hashset %HTML::Tagset::canTighten

This hashset lists elements that might have ignorable whitespace as
children or siblings.

=cut

%canTighten = %isKnown;
delete @canTighten{
  keys(%isPhraseMarkup), 'input', 'select',
  'xmp', 'listing', 'plaintext', 'pre',
};
  # xmp, listing, plaintext, and pre  are untightenable, and
  #   in a really special way.
@canTighten{'hr','br'} = (1,1);
 # exceptional 'phrasal' things that ARE subject to tightening.

# The one case where I can think of my tightening rules failing is:
#  <p>foo bar<center> <em>baz quux</em> ...
#                    ^-- that would get deleted.
# But that's pretty gruesome code anyhow.  You gets what you pays for.

#==========================================================================

=head2 array @HTML::Tagset::p_closure_barriers

This array has a meaning that I have only seen a need for in
C<HTML::TreeBuilder>, but I include it here on the off chance that someone
might find it of use:

When we see a "E<lt>pE<gt>" token, we go lookup up the lineage for a p
element we might have to minimize.  At first sight, we might say that
if there's a p anywhere in the lineage of this new p, it should be
closed.  But that's wrong.  Consider this document:

  <html>
    <head>
      <title>foo</title>
    </head>
    <body>
      <p>foo
        <table>
          <tr>
            <td>
               foo
               <p>bar
            </td>
          </tr>
        </table>
      </p>
    </body>
  </html>

The second p is quite legally inside a much higher p.

My formalization of the reason why this is legal, but this:

  <p>foo<p>bar</p></p>

isn't, is that something about the table constitutes a "barrier" to
the application of the rule about what p must minimize.

So C<@HTML::Tagset::p_closure_barriers> is the list of all such
barrier-tags.

=cut

@p_closure_barriers = qw(
  li blockquote
  ul ol menu dir
  dl dt dd
  td th tr table caption
  div
 );

# In an ideal world (i.e., XHTML) we wouldn't have to bother with any of this
# monkey business of barriers to minimization!

=head2 hashset %isCDATA_Parent
=4=

1|2|3| < PREV = PAGE 4 = NEXT > |5

UP TO ROOT | UP TO DIR | TO FIRST PAGE

Google
 


E-mail Facebook Google Digg del.icio.us BlinkList Fark Furl Ma.gnolia Netscape NewsVine Reddit Slashdot Spurl StumbleUpon Technorati YahooMyWeb LiveJournal Blogmarks TwitThis Live News2.ru BobrDobr.ru Memori.ru MoeMesto.ru

0.00615692 wallclock secs ( 0.01 usr + 0.00 sys = 0.01 CPU)