PROXY  WHOIS  RQUOTE  TEXTS  SOFT  FOREX  BBOARD
 Music  Philosophy  Code  Literature  Russian

= ROOT|Technical|RFC|rfc1630.txt =

page 3 of 16



      left to right parsing is more common than right to left.  The
      choice of a colon as separator of the prefix from the rest of the
      URI was arbitrary.

      The decoding of the rest of the string is defined as a function of
      the prefix.  New prefixed are introduced for new schemes as
      necessary, in agreement with the registration authority.  The
      registration of a new scheme clearly requires the definition of
      the decoding of the URI into a given name space, and a definition
      of the properties and, where applicable, resolution protocols, for
      the name space.

      The completeness requirement is easily met by allowing
      particularly strange or plain binary names to be encoded in base
      16 or 64 using the acceptable characters.

      The printability requirement could have been met by requiring all
      schemes to encode characters not part of a basic set.  This led to
      many discussions of what the basic set should be.  A difficult
      case, for example, is when an ISO latin 1 string appears in a URL,
      and within an application with ISO Latin-1 capability, it can be
      handled intact.  However, for transport in general, the non-ASCII




 
RFC 1630                      URIs in WWW                      June 1994


      characters need to be escaped.

      The solution to this was to specify a safe set of characters, and
      a general escaping scheme which may be used for encoding "unsafe"
      characters.  This "safe" set is suitable, for example, for use in
      electronic mail.  This is the canonical form of a URI.

      The choice of escape character for introducing representations of
      non-allowed characters also tends to be a matter of taste.  An
      ANSI standard exists in the C language, using the back-slash
      character "\".  The use of this character on unix command lines,
      however, can be a problem as it is interpreted by many shell
      programs, and would have itself to be escaped.  It is also a
      character which is not available on certain keyboards.  The equals
      sign is commonly used in the encoding of names having
      attribute=value pairs.  The percent sign was eventually chosen as
      a suitable escape character.

      There is a conflict between the need to be able to represent many
      characters including spaces within a URI directly, and the need to
      be able to use a URI in environments which have limited character
      sets or in which certain characters are prone to corruption.  This
      conflict has been resolved by use of an hexadecimal escaping
      method which may be applied to any characters forbidden in a given
      context.  When URLs are moved between contexts, the set of
      characters escaped may be enlarged or reduced unambiguously.

      The use of white space characters is risky in URIs to be printed
      or sent by electronic mail, and the use of multiple white space
      characters is very risky.  This is because of the frequent
      introduction of extraneous white space when lines are wrapped by
      systems such as mail, or sheer necessity of narrow column width,
      and because of the inter-conversion of various forms of white
      space which occurs during character code conversion and the
      transfer of text between applications.  This is why the canonical
      form for URIs has all white spaces encoded.

Reommendations

   This section describes the syntax for URIs as used in the WorldWide
   Web initiative.  The generic syntax provides a framework for new
   schemes for names to be resolved using as yet undefined protocols.

URI syntax

   A complete URI consists of a naming scheme specifier followed by a
   string whose format is a function of the naming scheme.  For locators
   of information on the Internet, a common syntax is used for the IP




 
RFC 1630                      URIs in WWW                      June 1994


   address part. A BNF description of the URL syntax is given in an a
   later section. The components are as follows.  Fragment identifiers
   and relative URIs are not involved in the basic URL definition.

   SCHEME

      Within the URI of a object, the first element is the name of the
      scheme, separated from the rest of the object by a colon.

   PATH

      The rest of the URI follows the colon in a format depending on the
      scheme. The path is interpreted in a manner dependent on the
      protocol being used.  However, when it contains slashes, these
=3=

1|2| < PREV = PAGE 3 = NEXT > |4|5|6|7|8|9|10|11|12.16

UP TO ROOT | UP TO DIR | TO FIRST PAGE

Google
 


E-mail Facebook Google Digg del.icio.us BlinkList Fark Furl Ma.gnolia Netscape NewsVine Reddit Slashdot Spurl StumbleUpon Technorati YahooMyWeb LiveJournal Blogmarks TwitThis Live News2.ru BobrDobr.ru Memori.ru MoeMesto.ru

0.0112271 wallclock secs ( 0.01 usr + 0.00 sys = 0.01 CPU)