Radio  Music  Philosophy  Code  Literature  Russian

= ROOT|Technical|Proxy_Docs|rfc2396.txt =

page 5 of 23

   represent a sequence of characters. A 'charset' defines this mapping.
   There are many charsets in use in Internet protocols. For example,
   UTF-8 [UTF-8] defines a mapping from sequences of octets to sequences
   of characters in the repertoire of ISO 10646.

   In the simplest case, the original character sequence contains only
   characters that are defined in US-ASCII, and the two levels of
   mapping are simple and easily invertible: each 'original character'
   is represented as the octet for the US-ASCII code for it, which is,
   in turn, represented as either the US-ASCII character, or else the
   "%" escape sequence for that octet.

   For original character sequences that contain non-ASCII characters,
   however, the situation is more difficult. Internet protocols that
   transmit octet sequences intended to represent character sequences
   are expected to provide some way of identifying the charset used, if
   there might be more than one [RFC2277].  However, there is currently
   no provision within the generic URI syntax to accomplish this
   identification. An individual URI scheme may require a single
   charset, define a default charset, or provide a way to indicate the
   charset used.

   It is expected that a systematic treatment of character encoding
   within URI will be developed as a future modification of this

2.2. Reserved Characters

   Many URI include components consisting of or delimited by, certain
   special characters.  These characters are called "reserved", since
   their usage within the URI component is limited to their reserved
   purpose.  If the data for a URI component would conflict with the
   reserved purpose, then the conflicting data must be escaped before
   forming the URI.

      reserved    = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
                    "$" | ","

   The "reserved" syntax class above refers to those characters that are
   allowed within a URI, but which may not be allowed within a
   particular component of the generic URI syntax; they are used as
   delimiters of the components described in Section 3.

RFC 2396                   URI Generic Syntax                August 1998

   Characters in the "reserved" set are not reserved in all contexts.
   The set of characters actually reserved within any given URI
   component is defined by that component. In general, a character is
   reserved if the semantics of the URI changes if the character is
   replaced with its escaped US-ASCII encoding.

2.3. Unreserved Characters

   Data characters that are allowed in a URI but do not have a reserved
   purpose are called unreserved.  These include upper and lower case
   letters, decimal digits, and a limited set of punctuation marks and

      unreserved  = alphanum | mark

      mark        = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"

   Unreserved characters can be escaped without changing the semantics
   of the URI, but this should not be done unless the URI is being used
   in a context that does not allow the unescaped character to appear.

2.4. Escape Sequences

   Data must be escaped if it does not have a representation using an
   unreserved character; this includes data that does not correspond to
   a printable character of the US-ASCII coded character set, or that
   corresponds to any US-ASCII character that is disallowed, as
   explained below.

2.4.1. Escaped Encoding

   An escaped octet is encoded as a character triplet, consisting of the
   percent character "%" followed by the two hexadecimal digits
   representing the octet code. For example, "%20" is the escaped
   encoding for the US-ASCII space character.

      escaped     = "%" hex hex
      hex         = digit | "A" | "B" | "C" | "D" | "E" | "F" |
                            "a" | "b" | "c" | "d" | "e" | "f"

2.4.2. When to Escape and Unescape

   A URI is always in an "escaped" form, since escaping or unescaping a
   completed URI might change its semantics.  Normally, the only time
   escape encodings can safely be made is when the URI is being created
   from its component parts; each component may have its own set of

1|2|3|4| < PREV = PAGE 5 = NEXT > |6|7|8|9|10|11|12|13|14.23



E-mail Facebook VKontakte Google Digg BlinkList NewsVine Reddit YahooMyWeb LiveJournal Blogmarks TwitThis Live

0.0128779 wallclock secs ( 0.00 usr + 0.01 sys = 0.01 CPU)