Network Working Group H. Alvestrand
Request for Comments: 2277 UNINETT
BCP: 18 January 1998
Category: Best Current Practice
IETF Policy on Character Sets and Languages
Status of this Memo
This document specifies an Internet Best Current Practices for the
Internet Community, and requests discussion and suggestions for
improvements. Distribution of this memo is unlimited.
Copyright Notice
Copyright (C) The Internet Society (1998). All Rights Reserved.
1. Introduction
The Internet is international.
With the international Internet follows an absolute requirement to
interchange data in a multiplicity of languages, which in turn
utilize a bewildering number of characters.
This document is the current policies being applied by the Internet
Engineering Steering Group (IESG) towards the standardization efforts
in the Internet Engineering Task Force (IETF) in order to help
Internet protocols fulfill these requirements.
The document is very much based upon the recommendations of the IAB
Character Set Workshop of February 29-March 1, 1996, which is
documented in RFC 2130 [WR]. This document attempts to be concise,
explicit and clear; people wanting more background are encouraged to
read RFC 2130.
The document uses the terms 'MUST', 'SHOULD' and 'MAY', and their
negatives, in the way described in [RFC 2119]. In this case, 'the
specification' as used by RFC 2119 refers to the processing of
protocols being submitted to the IETF standards process.
RFC 2277 Charset Policy January 1998
2. Where to do internationalization
Internationalization is for humans. This means that protocols are not
subject to internationalization; text strings are. Where protocol
elements look like text tokens, such as in many IETF application
layer protocols, protocols MUST specify which parts are protocol and
which are text. [WR 2.2.1.1]
Names are a problem, because people feel strongly about them, many of
them are mostly for local usage, and all of them tend to leak out of
the local context at times. RFC 1958 [RFC 1958] recommends US-ASCII
for all globally visible names.
This document does not mandate a policy on name internationalization,
but requires that all protocols describe whether names are
internationalized or US-ASCII.
NOTE: In the protocol stack for any given application, there is
usually one or a few layers that need to address these problems.
It would, for instance, not be appropriate to define language tags
for Ethernet frames. But it is the responsibility of the WGs to
ensure that whenever responsibility for internationalization is left
to "another layer", those responsible for that layer are in fact
aware that they HAVE that responsibility.
3. Definition of Terms
This document uses the term "charset" to mean a set of rules for
mapping from a sequence of octets to a sequence of characters, such
as the combination of a coded character set and a character encoding
scheme; this is also what is used as an identifier in MIME "charset="
parameters, and registered in the IANA charset registry [REG]. (Note
that this is NOT a term used by other standards bodies, such as ISO).
For a definition of the term "coded character set", refer to the
workshop report.
=1= |