PROXY  WHOIS  RQUOTE  TEXTS  SOFT  FOREX  BBOARD
 Music  Philosophy  Code  Literature  Russian

= ROOT|Technical|RFC|rfc2130.txt =

page 5 of 18



   While there may be other text attributes intimately associated with
   the language of the document, such as desired font or text direction,
   these should be specified with other identifiers rather than
   overloading the language tag.

3.2:  On the wire

   There are three segments of the model which are required for
   completely specifying the content of a transmitted text stream (with
   the occasional exception of the Language component, mentioned above).
   These components are:

   1)  Coded Character Set,
   2)  Character Encoding Scheme, and
   3)  Transfer Encoding Syntax.

   Each of these abstract components must be explicitly specified by the
   transmitter when the data is sent.  There may be instances of an
   implicit specification due to the protocol/standard being used (i.e.
   ANSI/NISO Z39.50).  Also, in MIME, the Coded Character Set and
   Character Encoding Scheme are specified by the Charset parameter to
   the Content-Type header field, and Transfer Encoding Syntax is
   specified by the Content-Transfer-Encoding header field.

3.2.1:  Coded Character Set

   A Coded Character Set (CCS) is a mapping from a set of abstract
   characters to a set of integers.  Examples of coded character sets
   are ISO 10646 [ISO-10646], US-ASCII [ASCII], and ISO-8859 series
   [ISO-8859].

3.2.2:  Character Encoding Scheme

   A Character Encoding Scheme (CES) is a mapping from a Coded Character
   Set or several coded character sets to a set of octets. Examples of
   Character Encoding Schemes are ISO 2022 [ISO-2022] and UTF-8 [UTF-8].
   A given CES is typically associated with a single CCS; for example,
   UTF-8 applies only to ISO 10646.












 
RFC 2130             Character Set Workshop Report            April 1997


3.2.3:  Transfer Encoding Syntax

   It is frequently necessary to transform encoded text into a format
   which is transmissible by specific protocols.  The Transfer Encoding
   Syntax (TES) is a transformation applied to character data encoded
   using a CCS and possibly a CES to allow it to be transmitted.
   Examples of Transfer Encoding Syntaxes are Base64 Encoding [Base64],
   gzip encoding, and so forth.

3.3:  Determining which values of CCS, CES, and TES are used

   To completely specify which CCS, CES, and TES are used in a specific
   text transmission, there needs to be a consistent set of labels for
   specifying which CCS, CES, and TES are used.  Once the appropriate
   mechanisms have been selected, there are six techniques for attaching
   these labels to the data.

   The labels themselves are named and registered, either with IANA
   [IANA] or with some other registry.  Ideally, their definitions are
   retrievable from some registration authority.

   Labels may be determined in one of the following ways:

   -  Determined by guessing, where the receiver of the text has to
      guess the values of the CCS, CES, and TES. For example: "I got
      this from Sweden so it's probably  ISO-8859-1."  This is
      obviously not a very foolproof way to decode text.
   -  Determined by the standard, where the protocol used to transmit
      the data has made documented choices of CCS, CES, and TES in the
      standard. Thus, the encodings used are known through the
      access protocol, for example HTTP [HTTP] uses (but is not
      limited to) ISO-8859-1, SMTP uses US-ASCII.
   -  Attached to the transfer envelope, where the descriptive labels are
      attached to the wrapper placed around the text for transport.
      MIME headers are a good example of this technique.
   -  Included in the data stream, where the data stream itself has
      been encoded in such a way as to signal the character set used.
      For example, ISO-2022 encodes the data with escape sequences to
      provide information on the character subset currently being used.
   -  Agreed by prior bilateral agreement, where some out-of-band
      negotiation has allowed the text transmitter and receiver to
      determine the CCS, CES, and  TES for the transmitted text.
   -  Agreed to by negotiation during some phase, typically
      initialization of the protocol.


=5=

1|2|3|4| < PREV = PAGE 5 = NEXT > |6|7|8|9|10|11|12|13|14.18

UP TO ROOT | UP TO DIR | TO FIRST PAGE

Google
 


E-mail Facebook Google Digg del.icio.us BlinkList Fark Furl Ma.gnolia Netscape NewsVine Reddit Slashdot Spurl StumbleUpon Technorati YahooMyWeb LiveJournal Blogmarks TwitThis Live News2.ru BobrDobr.ru Memori.ru MoeMesto.ru

0.0163372 wallclock secs ( 0.01 usr + 0.00 sys = 0.01 CPU)