PROXY  WHOIS  RQUOTE  TEXTS  SOFT  FOREX  BBOARD
 Music  Philosophy  Code  Literature  Russian

= ROOT|Technical|RFC|rfc2152.txt =

page 1 of 9









Network Working Group                                       D. Goldsmith
Request for Comments: 2152                          Apple Computer, Inc.
Obsoletes: RFC 1642                                             M. Davis
Category: Informational                                   Taligent, Inc.
                                                                May 1997


                                 UTF-7

              A Mail-Safe Transformation Format of Unicode

Status of this Memo

   This memo provides information for the Internet community.  This memo
   does not specify an Internet standard of any kind.  Distribution of
   this memo is unlimited.

Abstract

   The Unicode Standard, version 2.0, and ISO/IEC 10646-1:1993(E) (as
   amended) jointly define a character set (hereafter referred to as
   Unicode) which encompasses most of the world's writing systems.
   However, Internet mail (STD 11, RFC 822) currently supports only 7-
   bit US ASCII as a character set. MIME (RFC 2045 through 2049) extends
   Internet mail to support different media types and character sets,
   and thus could support Unicode in mail messages. MIME neither defines
   Unicode as a permitted character set nor specifies how it would be
   encoded, although it does provide for the registration of additional
   character sets over time.

   This document describes a transformation format of Unicode that
   contains only 7-bit ASCII octets and is intended to be readable by
   humans in the limiting case that the document consists of characters
   from the US-ASCII repertoire. It also specifies how this
   transformation format is used in the context of MIME and RFC 1641,
   "Using Unicode with MIME".

Motivation

   Although other transformation formats of Unicode exist and could
   conceivably be used in this context (most notably UTF-8, also known
   as UTF-2 or UTF-FSS), they suffer the disadvantage that they use
   octets in the range decimal 128 through 255 to encode Unicode
   characters outside the US-ASCII range. Thus, in the context of mail,
   those octets must themselves be encoded. This requires putting text
   through two successive encoding processes, and leads to a significant
   expansion of characters outside the US-ASCII range, putting non-
   English speakers at a disadvantage. For example, using UTF-8 together




 
RFC 2152                         UTF-7                          May 1997


   with the Quoted-Printable content transfer encoding of MIME
   represents US-ASCII characters in one octet, but other characters may
   require up to nine octets.

Overview

   UTF-7 encodes Unicode characters as US-ASCII octets, together with
   shift sequences to encode characters outside that range. For this
   purpose, one of the characters in the US-ASCII repertoire is reserved
   for use as a shift character.

   Many mail gateways and systems cannot handle the entire US-ASCII
   character set (those based on EBCDIC, for example), and so UTF-7
   contains provisions for encoding characters within US-ASCII in a way
   that all mail systems can accomodate.

   UTF-7 should normally be used only in the context of 7 bit
   transports, such as mail. In other contexts, straight Unicode or
   UTF-8 is preferred.

   See RFC 1641, "Using Unicode with MIME" for the overall specification
   on usage of Unicode transformation formats with MIME.

Definitions

   First, the definition of Unicode:

      The 16 bit character set Unicode is defined by "The Unicode
      Standard, Version 2.0". This character set is identical with the
      character repertoire and coding of the international standard
      ISO/IEC 10646-1:1993(E); Coded Representation Form=UCS-2;
      Subset=300; Implementation Level=3, including the first 7
      amendments to 10646 plus editorial corrections.

      Note. Unicode 2.0 further specifies the use and interaction of
      these character codes beyond the ISO standard. However, any valid
      10646 sequence is a valid Unicode sequence, and vice versa;
      Unicode supplies interpretations of sequences on which the ISO
=1=

= PAGE 1 = NEXT > |2|3|4|5|6|7|8|9

UP TO ROOT | UP TO DIR

Google
 


E-mail Facebook Google Digg del.icio.us BlinkList Fark Furl Ma.gnolia Netscape NewsVine Reddit Slashdot Spurl StumbleUpon Technorati YahooMyWeb LiveJournal Blogmarks TwitThis Live News2.ru BobrDobr.ru Memori.ru MoeMesto.ru

0.033406 wallclock secs ( 0.01 usr + 0.01 sys = 0.02 CPU)