PROXY  WHOIS  RQUOTE  TEXTS  SOFT  FOREX  BBOARD
 Music  Philosophy  Code  Literature  Russian

= ROOT|Technical|RFC|rfc2130.txt =

page 2 of 18



   MIME registry of character sets registers items that have great
   differences in semantics and applicability. This workshop provides
   guidance to the IAB and IETF about the use of character sets on the
   Internet and provides a common framework for interoperability between
   the many characters in use there.

   The framework consists of four components: an architecture model,
   which specifies components necessary for on-the-wire transmission of
   text; recommendations for tagging transmitted (and stored) text;
   recommended defaults for each level of the model; and a set of




 
RFC 2130             Character Set Workshop Report            April 1997


   recommendations to the IAB, IANA, and the IESG for furthering the
   integration of  this framework into text transmission protocols.

   The architectural model specifies 7 layers, of which only three are
   required for on-the-wire transmission. The Coded Character Set is a
   mapping from a set of abstract characters to a set of integers. The
   Character Encoding Scheme is a mapping from a Coded Character Set (or
   several) to a set of octets. The Transfer Encoding Syntax is a
   transformation applied to data which has been encoded using a
   Character Encoding Scheme to allow it to be transmitted. These layers
   should be specified in a transmitted text stream by using the MIME
   encoding mechanisms.

   This report recommends the use of ISO 10646 as the default Coded
   Character Set, and UTF-8 as the default Character Encoding Scheme in
   the creation of new protocols or new version of old protocols which
   transmit text. These defaults do not deprecate the use of other
   character sets when and where they are needed; they are simply
   intended to provide guidance and a specification for
   interoperability.

1:  Introduction

   This is the report of an IAB-sponsored invitational workshop on the
   use of Character Sets on the Internet, held 29 February - 1 March
   1996 at Information Sciences Institute (ISI) in Marina del Rey,
   California.  In addition, this report covers the discussion on the
   mailing list up to and slightly beyond the workshop itself.  The
   goals of this workshop were to provide guidance to the IAB and the
   IETF about the use of character sets on the Internet, and if possible
   a common framework for interoperability between the many character
   sets in use there.  Both goals were achieved.

2:  Character sets on the Internet - the problem

   The term 'character set' is typically applied to the contents of a
   wide variety of text transmission and display protocols used on the
   Internet.  Because the term is used to mean different things,
   confusion has arisen.  For example, the MIME registry of character
   sets [MIME] contains items that may differ greatly in their
   applicability and semantics in various Internet protocols.

   In addition, there is a vast profusion of different text encoding
   schemes in use on the Internet.  This per se is not a problem; each
   scheme has evolved to meet real needs.  However, information
   applications such as mail, directories, and the World Wide Web have
   each developed different techniques for dealing with the growing
   number of schemes.  A robust information architecture for the




 
RFC 2130             Character Set Workshop Report            April 1997


   Internet requires as much interoperability between these techniques
   as possible.

2.1:  Related topics deemed out of scope for this workshop

   Successful display of plain text transmitted over the Internet
   requires a lot of information about the text itself, such as the
   underlying character set, language, and so forth.  An additional set
   of formatting information is needed if the receiving application
   wishes to use local (cultural) conventions when it presents the data
   to the user.  This formatting includes information, that provides the
   data necessary to format certain  types of textual data (dates,
   times, numbers and monetary notation) into a form which is familiar
   to the user.  The POSIX [POSIX] notation of locale encompasses
   language, coded character set and cultural conventions.

   To avoid unfruitful discussion, and to make the best use of the time
   available for the workshop, we declared the following  issues out of
   scope for the purposes of this workshop:

   -  glyphs
   -  sorting
   -  culture (e.g. do we present the American or British spelling?)
   -  user interface issues
   -  internal representation of textual data
   -  included characters (why aren't certain characters available in
=2=

1| < PREV = PAGE 2 = NEXT > |3|4|5|6|7|8|9|10|11.18

UP TO ROOT | UP TO DIR | TO FIRST PAGE

Google
 


E-mail Facebook Google Digg del.icio.us BlinkList Fark Furl Ma.gnolia Netscape NewsVine Reddit Slashdot Spurl StumbleUpon Technorati YahooMyWeb LiveJournal Blogmarks TwitThis Live News2.ru BobrDobr.ru Memori.ru MoeMesto.ru

0.01051 wallclock secs ( 0.01 usr + 0.00 sys = 0.01 CPU)