terms used in the grammar for the header extensions defined here.
Successful implementation of this protocol extension requires
careful attention to the details of both STD 11, RFC 822 and RFC
1521.
When the term "ASCII" appears in this memo, it refers to the "7-
Bit American Standard Code for Information Interchange", ANSI
X3.4-1986. The MIME charset name for this character set is "US-
ASCII". When not specifically referring to the MIME charset name,
this document uses the term "ASCII", both for brevity and for
RFC 1522 MIME Part Two September 1993
consistency with STD 11, RFC 822. However, implementors are
warned that the character set name must be spelled "US-ASCII" in
MIME message and body part headers.
2. Syntax of encoded-words
An "encoded-word" is defined by the following ABNF grammar. The
notation of RFC 822 is used, with the exception that white space
characters MAY NOT appear between components of an encoded-word.
encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
charset = token ; see section 3
encoding = token ; see section 4
token = 1*<Any CHAR except SPACE, CTLs, and especials>
especials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "
<"> / "/" / "[" / "]" / "?" / "." / "="
encoded-text = 1*<Any printable ASCII character other
than "?" or SPACE>
; (but see "Use of encoded-words in message
; headers", section 5)
Both "encoding" and "charset" names are case-independent. Thus the
charset name "ISO-8859-1" is equivalent to "iso-8859-1", and the
encoding named "Q" may be spelled either "Q" or "q".
An encoded-word may not be more than 75 characters long, including
charset, encoding, encoded-text, and delimiters. If it is desirable
to encode more text than will fit in an encoded-word of 75
characters, multiple encoded-words (separated by CRLF SPACE) may be
used.
While there is no limit to the length of a multiple-line header
field, each line of a header field that contains one or more
encoded-words is limited to 76 characters.
The length restrictions are included not only to ease
interoperability through internetwork mail gateways, but also to
impose a limit on the amount of lookahead a header parser must employ
(while looking for a final ?= delimiter) before it can decide whether
a token is an encoded-word or something else.
The characters which may appear in encoded-text are further
restricted by the rules in section 5.
RFC 1522 MIME Part Two September 1993
3. Character sets
The "charset" portion of an encoded-word specifies the character set
associated with the unencoded text. A charset can be any of the
character set names allowed in an RFC 1521 "charset" parameter of a
"text/plain" body part, or any character set name registered with
IANA for use with the MIME text/plain content-type [3]. (See section
7.1.1 of RFC 1521 for a list of charsets defined in that document).
Some character sets use code-switching techniques to switch between
"ASCII mode" and other modes. If unencoded text in an encoded-word
contains control codes to switch out of ASCII mode, it must also
contain additional control codes such that ASCII mode is again
selected at the end of the encoded-word. (This rule applies
separately to each encoded-word, including adjacent encoded-words
within a single header field.)
When there is a possibility of using more than one character set to
represent the text in an encoded-word, and in the absence of private
agreements between sender and recipients of a message, it is
recommended that members of the ISO-8859-* series be used in
preference to other character sets.
4. Encodings
Initially, the legal values for "encoding" are "Q" and "B". These
=2= |