RFC 1521 MIME September 1993
Appendix G -- Canonical Encoding Model
There was some confusion, in earlier drafts of this memo, regarding
the model for when email data was to be converted to canonical form
and encoded, and in particular how this process would affect the
treatment of CRLFs, given that the representation of newlines varies
greatly from system to system. For this reason, a canonical model
for encoding is presented below.
The process of composing a MIME entity can be modeled as being done
in a number of steps. Note that these steps are roughly similar to
those steps used in RFC 1421 and are performed for each 'innermost
level' body:
Step 1. Creation of local form.
The body to be transmitted is created in the system's native format.
The native character set is used, and where appropriate local end of
line conventions are used as well. The body may be a UNIX-style text
file, or a Sun raster image, or a VMS indexed file, or audio data in
a system-dependent format stored only in memory, or anything else
that corresponds to the local model for the representation of some
form of information. Fundamentally, the data is created in the
"native" form specified by the type/subtype information.
Step 2. Conversion to canonical form.
The entire body, including "out-of-band" information such as record
lengths and possibly file attribute information, is converted to a
universal canonical form. The specific content type of the body as
well as its associated attributes dictate the nature of the canonical
form that is used. Conversion to the proper canonical form may
involve character set conversion, transformation of audio data,
compression, or various other operations specific to the various
content types. If character set conversion is involved, however,
care must be taken to understand the semantics of the content-type,
which may have strong implications for any character set conversion,
e.g. with regard to syntactically meaningful characters in a text
subtype other than "plain".
For example, in the case of text/plain data, the text must be
converted to a supported character set and lines must be delimited
with CRLF delimiters in accordance with RFC822. Note that the
restriction on line lengths implied by RFC822 is eliminated if the
next step employs either quoted-printable or base64 encoding.
RFC 1521 MIME September 1993
Step 3. Apply transfer encoding.
A Content-Transfer-Encoding appropriate for this body is applied.
Note that there is no fixed relationship between the content type and
the transfer encoding. In particular, it may be appropriate to base
the choice of base64 or quoted-printable on character frequency
counts which are specific to a given instance of a body.
Step 4. Insertion into entity.
The encoded object is inserted into a MIME entity with appropriate
headers. The entity is then inserted into the body of a higher-level
entity (message or multipart) if needed.
It is vital to note that these steps are only a model; they are
specifically NOT a blueprint for how an actual system would be built.
In particular, the model fails to account for two common designs:
1. In many cases the conversion to a canonical form prior to
encoding will be subsumed into the encoder itself, which
understands local formats directly. For example, the local
newline convention for text bodies might be carried through to the
encoder itself along with knowledge of what that format is.
2. The output of the encoders may have to pass through one or
more additional steps prior to being transmitted as a message. As
such, the output of the encoder may not be conformant with the
formats specified by RFC822. In particular, once again it may be
appropriate for the converter's output to be expressed using local
newline conventions rather than using the standard RFC822 CRLF
delimiters.
Other implementation variations are conceivable as well. The vital
aspect of this discussion is that, in spite of any optimizations,
collapsings of required steps, or insertion of additional processing,
the resulting messages must be consistent with those produced by the
model described here. For example, a message with the following
header fields:
=43= |