protocol-specific operations, such as charset-based content
negotiation in HTTP.
"UTF-8" [RFC-2279] and "UTF-16" (Appendix C.3 of [UNICODE] and
Amendment 1 of [ISO-10646]) are the recommended values,
representing the UTF-8 and UTF-16 charsets, respectively. These
charsets are preferred since they are supported by all conforming
XML processors [REC-XML].
If an application/xml entity is received where the charset
parameter is omitted, no information is being provided about the
charset by the MIME Content-Type header. Conforming XML processors
MUST follow the requirements in section 4.3.3 of [REC-XML] which
directly address this contingency. However, MIME processors which
are not XML processors should not assume a default charset if the
charset parameter is omitted from an application/xml entity.
Since the charset parameter is authoritative, the charset is not
always declared within an XML encoding declaration. Thus, special
care is needed when the recipient strips the MIME header and
provides persistent storage of the received XML entity (e.g., in a
file system). Unless the charset is UTF-8 or UTF-16, the
recipient SHOULD also persistently store information about the
charset, perhaps by embedding a correct XML encoding declaration
within the XML entity.
RFC 2376 XML Media Types July 1998
Encoding considerations:
This media type MAY be encoded as appropriate for the charset and
the capabilities of the underlying MIME transport. For 7-bit
transports, data in both UTF-8 and UTF-16 is encoded in quoted-
printable or base64. For 8-bit clean transport (e.g., ESMTP,
8BITMIME, or NNTP), UTF-8 is not encoded, but UTF-16 is base64
encoded. For binary clean transport (e.g., HTTP), no content-
transfer-encoding is necessary.
Security considerations:
See section 4 below.
Interoperability considerations:
XML has proven to be interoperable for import and export from
multiple XML authoring tools.
Published specification: see [REC-XML]
Applications which use this media type:
XML is device-, platform-, and vendor-neutral and is supported by
a wide range of Web user agents and XML authoring tools.
Additional information:
Magic number(s): none
Although no byte sequences can be counted on to always be present,
XML entities in ASCII-compatible charsets (including UTF-8) often
begin with hexadecimal 3C 3F 78 6D 6C ("<?xml"), and those in
UTF-16 often begin with hexadecimal FE FF 00 3C 00 3F 00 78 00 6D
or FF FE 3C 00 3F 00 78 00 6D 00 (the Byte Order Mark (BOM)
followed by "<?xml"). For more information, see Annex F of [REC-
XML].
File extension(s): .xml, .dtd
Macintosh File Type Code(s): "TEXT"
Person & email address for further information:
Dan Connolly <connolly@w3.org>
Murata Makoto (Family Given) <murata@fxis.fujixerox.co.jp>
Intended usage: COMMON
RFC 2376 XML Media Types July 1998
Author/Change controller:
=4= |