RFC 2049 MIME Conformance November 1996
(2) Conversion to canonical form.
The entire body, including "out-of-band" information
such as record lengths and possibly file attribute
information, is converted to a universal canonical
form. The specific media type of the body as well as
its associated attributes dictate the nature of the
canonical form that is used. Conversion to the proper
canonical form may involve character set conversion,
transformation of audio data, compression, or various
other operations specific to the various media types.
If character set conversion is involved, however, care
must be taken to understand the semantics of the media
type, which may have strong implications for any
character set conversion, e.g. with regard to
syntactically meaningful characters in a text subtype
other than "plain".
For example, in the case of text/plain data, the text
must be converted to a supported character set and
lines must be delimited with CRLF delimiters in
accordance with RFC 822. Note that the restriction on
line lengths implied by RFC 822 is eliminated if the
next step employs either quoted-printable or base64
encoding.
(3) Apply transfer encoding.
A Content-Transfer-Encoding appropriate for this body
is applied. Note that there is no fixed relationship
between the media type and the transfer encoding. In
particular, it may be appropriate to base the choice of
base64 or quoted-printable on character frequency
counts which are specific to a given instance of a
body.
(4) Insertion into entity.
The encoded body is inserted into a MIME entity with
appropriate headers. The entity is then inserted into
the body of a higher-level entity (message or
multipart) as needed.
Conversion from entity form to local form is accomplished by
reversing these steps. Note that reversal of these steps may produce
differing results since there is no guarantee that the original and
final local forms are the same.
RFC 2049 MIME Conformance November 1996
It is vital to note that these steps are only a model; they are
specifically NOT a blueprint for how an actual system would be built.
In particular, the model fails to account for two common designs:
(1) In many cases the conversion to a canonical form prior
to encoding will be subsumed into the encoder itself,
which understands local formats directly. For example,
the local newline convention for text bodies might be
carried through to the encoder itself along with
knowledge of what that format is.
(2) The output of the encoders may have to pass through one
or more additional steps prior to being transmitted as
a message. As such, the output of the encoder may not
be conformant with the formats specified by RFC 822.
In particular, once again it may be appropriate for the
converter's output to be expressed using local newline
conventions rather than using the standard RFC 822 CRLF
delimiters.
Other implementation variations are conceivable as well. The vital
aspect of this discussion is that, in spite of any optimizations,
collapsings of required steps, or insertion of additional processing,
the resulting messages must be consistent with those produced by the
model described here. For example, a message with the following
header fields:
Content-type: text/foo; charset=bar
Content-Transfer-Encoding: base64
must be first represented in the text/foo form, then (if necessary)
represented in the "bar" character set, and finally transformed via
the base64 algorithm into a mail-safe form.
=6= |