PROXY  WHOIS  RQUOTE  TEXTS  SOFT  FOREX  BBOARD
 Music  Philosophy  Code  Literature  Russian

= ROOT|Technical|RFC|rfc2277.txt =

page 2 of 6



   A "name" is an identifier such as a person's name, a hostname, a
   domainname, a filename or an E-mail address; it is often treated as
   an identifier rather than as a piece of text, and is often used in
   protocols as an identifier for entities, without surrounding text.

3.1.  What charset to use

   All protocols MUST identify, for all character data, which charset is
   in use.





 
RFC 2277                     Charset Policy                 January 1998


   Protocols MUST be able to use the UTF-8 charset, which consists of
   the ISO 10646 coded character set combined with the UTF-8 character
   encoding scheme, as defined in [10646] Annex R (published in
   Amendment 2), for all text.

   Protocols MAY specify, in addition, how to use other charsets or
   other character encoding schemes for ISO 10646, such as UTF-16, but
   lack of an ability to use UTF-8 is a violation of this policy; such a
   violation would need a variance procedure ([BCP9] section 9) with
   clear and solid justification in the protocol specification document
   before being entered into or advanced upon the standards track.

   For existing protocols or protocols that move data from existing
   datastores, support of other charsets, or even using a default other
   than UTF-8, may be a requirement. This is acceptable, but UTF-8
   support MUST be possible.

   When using other charsets than UTF-8, these MUST be registered in the
   IANA charset registry, if necessary by registering them when the
   protocol is published.

   (Note: ISO 10646 calls the UTF-8 CES a "Transformation Format" rather
   than a "character encoding scheme", but it fits the charset workshop
   report definition of a character encoding scheme).

3.2.  How to decide a charset

   When the protocol allows a choice of multiple charsets, someone must
   make a decision on which charset to use.

   In some cases, like HTTP, there is direct or semi-direct
   communication between the producer and the consumer of data
   containing text. In such cases, it may make sense to negotiate a
   charset before sending data.

   In other cases, like E-mail or stored data, there is no such
   communication, and the best one can do is to make sure the charset is
   clearly identified with the stored data, and choosing a charset that
   is as widely known as possible.

   Note that a charset is an absolute; text that is encoded in a charset
   cannot be rendered comprehensibly without supporting that charset.

   (This also applies to English texts; charsets like EBCDIC do NOT have
   ASCII as a proper subset)







 
RFC 2277                     Charset Policy                 January 1998


   Negotiating a charset may be regarded as an interim mechanism that is
   to be supported until support for interchange of UTF-8 is prevalent;
   however, the timeframe of "interim" may be at least 50 years, so
   there is every reason to think of it as permanent in practice.

4.  Languages

4.1.  The need for language information

   All human-readable text has a language.

   Many operations, including high quality formatting, text-to-speech
   synthesis, searching, hyphenation, spellchecking and so on benefit
   greatly from access to information about the language of a piece of
   text. [WC 3.1.1.4].

   Humans have some tolerance for foreign languages, but are generally
   very unhappy with being presented text in a language they do not
   understand; this is why negotiation of language is needed.

   In most cases, machines will not be able to deduce the language of a
   transmitted text by themselves; the protocol must specify how to
   transfer the language information if it is to be available at all.

   The interaction between language and processing is complex; for
   instance, if I compare "name-of-thing(lang=en)" to "name-of-
=2=

1| < PREV = PAGE 2 = NEXT > |3|4|5|6

UP TO ROOT | UP TO DIR | TO FIRST PAGE

Google
 


E-mail Facebook Google Digg del.icio.us BlinkList Fark Furl Ma.gnolia Netscape NewsVine Reddit Slashdot Spurl StumbleUpon Technorati YahooMyWeb LiveJournal Blogmarks TwitThis Live News2.ru BobrDobr.ru Memori.ru MoeMesto.ru

0.025018 wallclock secs ( 0.01 usr + 0.00 sys = 0.01 CPU)