To avoid confusion with the "urn:" identifier, the NID "urn" is
reserved and MUST NOT be used.
RFC 2141 URN Syntax May 1997
2.2 Namespace Specific String Syntax
As required by RFC 1737, there is a single canonical representation
of the NSS portion of an URN. The format of this single canonical
form follows:
::= 1*
::= | "%"
::= | | | |
::= | "A" | "B" | "C" | "D" | "E" | "F" |
"a" | "b" | "c" | "d" | "e" | "f"
::= "(" | ")" | "+" | "," | "-" | "." |
":" | "=" | "@" | ";" | "$" |
"_" | "!" | "*" | "'"
Depending on the rules governing a namespace, valid identifiers in a
namespace might contain characters that are not members of the URN
character set above (). Such strings MUST be translated
into canonical NSS format before using them as protocol elements or
otherwise passing them on to other applications. Translation is done
by encoding each character outside the URN character set as a
sequence of one to six octets using UTF-8 encoding [5], and the
encoding of each of those octets as "%" followed by two characters
from the character set above. The two characters give the
hexadecimal representation of that octet.
2.3 Reserved characters
The remaining character set left to be discussed above is the
reserved character set, which contains various characters reserved
from normal use. The reserved character set follows, with a
discussion on the specifics of why each character is reserved.
The reserved character set is:
::= '%" | "/" | "?" | "#"
2.3.1 The "%" character
The "%" character is reserved in the URN syntax for introducing the
escape sequence for an octet. Literal use of the "%" character in a
namespace must be encoded using "%25" in URNs for that namespace.
The presence of an "%" character in an URN MUST be followed by two
characters from the character set.
RFC 2141 URN Syntax May 1997
Namespaces MAY designate one or more characters from the URN
character set as having special meaning for that namespace. If the
namespace also uses that character in a literal sense as well, the
character used in a literal sense MUST be encoded with "%" followed
by the hexadecimal representation of that octet. Further, a
character MUST NOT be "%"-encoded if the character is not a reserved
character. Therefore, the process of registering a namespace
identifier shall include publication of a definition of which
characters have a special meaning to that namespace.
2.3.2 The other reserved characters
RFC 1630 [2] reserves the characters "/", "?", and "#" for particular
purposes. The URN-WG has not yet debated the applicability and
precise semantics of those purposes as applied to URNs. Therefore,
these characters are RESERVED for future developments. Namespace
developers SHOULD NOT use these characters in unencoded form, but
rather use the appropriate %-encoding for each character.
2.4 Excluded characters
The following list is included only for the sake of completeness.
Any octets/characters on this list are explicitly NOT part of the URN
character set, and if used in an URN, MUST be %encoded:
::= octets 1-32 (1-20 hex) | "\" | """ | "&" | "<"
=2= |