Network Working Group N. Freed
Request for Comments: 2278 Innosoft
BCP: 19 J. Postel
Category: Best Current Practice ISI
January 1998
IANA Charset
Registration Procedures
Status of this Memo
This document specifies an Internet Best Current Practices for the
Internet Community, and requests discussion and suggestions for
improvements. Distribution of this memo is unlimited.
Copyright Notice
Copyright (C) The Internet Society (1998). All Rights Reserved.
1. Abstract
MIME [RFC-2045, RFC-2046, RFC-2047, RFC-2184] and various other
modern Internet protocols are capable of using many different
charsets. This in turn means that the ability to label different
charsets is essential. This registration procedure exists solely to
associate a specific name or names with a given charset and to give
an indication of whether or not a given charset can be used in MIME
text objects. In particular, the general applicability and
appropriateness of a given registered charset is a protocol issue,
not a registration issue, and is not dealt with by this registration
procedure.
2. Definitions and Notation
The following sections define various terms used in this document.
2.1. Requirements Notation
This document occasionally uses terms that appear in capital letters.
When the terms "MUST", "SHOULD", "MUST NOT", "SHOULD NOT", and "MAY"
appear capitalized, they are being used to indicate particular
requirements of this specification. A discussion of the meanings of
these terms appears in [RFC-2119].
RFC 2278 Charset Registration January 1998
2.2. Character
A member of a set of elements used for the organisation, control, or
representation of data.
2.3. Charset
The term "charset" (see historical note below) is used here to refer
to a method of converting a sequence of octets into a sequence of
characters. This conversion may also optionally produce additional
control information such as directionality indicators.
Note that unconditional and unambiguous conversion in the other
direction is not required, in that not all characters may be
representable by a given charset and a charset may provide more than
one sequence of octets to represent a particular sequence of
characters.
This definition is intended to allow charsets to be defined in a
variety of different ways, from simple single-table mappings such as
US-ASCII to complex table switching methods such as those that use
ISO 2022's techniques, to be used as charsets. However, the
definition associated with a charset name must fully specify the
mapping to be performed. In particular, use of external profiling
information to determine the exact mapping is not permitted.
HISTORICAL NOTE: The term "character set" was originally used in MIME
to describe such straightforward schemes as US-ASCII and ISO-8859-1
which consist of a small set of characters and a simple one-to-one
mapping from single octets to single characters. Multi-octet
character encoding schemes and switching techniques make the
situation much more complex. As such, the definition of this term was
revised to emphasize both the conversion aspect of the process, and
the term itself has been changed to "charset" to emphasize that it is
not, after all, just a set of characters. A discussion of these
issues as well as specification of standard terminology for use in
the IETF appears in RFC 2130.
=1= |