4. URI References
The term "URI-reference" is used here to denote the common usage of a
resource identifier. A URI reference may be absolute or relative,
and may have additional information attached in the form of a
fragment identifier. However, "the URI" that results from such a
reference includes only the absolute URI after the fragment
identifier (if any) is removed and after any relative URI is resolved
to its absolute form. Although it is possible to limit the
discussion of URI syntax and semantics to that of the absolute
result, most usage of URI is within general URI references, and it is
impossible to obtain the URI from such a reference without also
parsing the fragment and resolving the relative form.
URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]
The syntax for relative URI is a shortened form of that for absolute
URI, where some prefix of the URI is missing and certain path
components ("." and "..") have a special meaning when, and only when,
interpreting a relative path. The relative URI syntax is defined in
4.1. Fragment Identifier
When a URI reference is used to perform a retrieval action on the
identified resource, the optional fragment identifier, separated from
the URI by a crosshatch ("#") character, consists of additional
reference information to be interpreted by the user agent after the
retrieval action has been successfully completed. As such, it is not
part of a URI, but is often used in conjunction with a URI.
fragment = *uric
The semantics of a fragment identifier is a property of the data
resulting from a retrieval action, regardless of the type of URI used
in the reference. Therefore, the format and interpretation of
fragment identifiers is dependent on the media type [RFC2046] of the
retrieval result. The character restrictions described in Section 2
RFC 2396 URI Generic Syntax August 1998
for URI also apply to the fragment in a URI-reference. Individual
media types may define additional restrictions or structure within
the fragment for specifying different types of "partial views" that
can be identified within that media type.
A fragment identifier is only meaningful when a URI reference is
intended for retrieval and the result of that retrieval is a document
for which the identified fragment is consistently defined.
4.2. Same-document References
A URI reference that does not contain a URI is a reference to the
current document. In other words, an empty URI reference within a
document is interpreted as a reference to the start of that document,
and a reference containing only a fragment identifier is a reference
to the identified fragment of that document. Traversal of such a
reference should not result in an additional retrieval action.
However, if the URI reference occurs in a context that is always
intended to result in a new request, as in the case of HTML's FORM
element, then an empty URI reference represents the base URI of the
current document and should be replaced by that URI when transformed
into a request.
4.3. Parsing a URI Reference
A URI reference is typically parsed according to the four main
components and fragment identifier in order to determine what
components are present and whether the reference is relative or
absolute. The individual components are then parsed for their
subparts and, if not opaque, to verify their validity.
Although the BNF defines what is allowed in each component, it is
ambiguous in terms of differentiating between an authority component
and a path component that begins with two slash characters. The
greedy algorithm is used for disambiguation: the left-most matching
rule soaks up as much of the URI reference string as it is capable of
matching. In other words, the authority component wins.
Readers familiar with regular expressions should see Appendix B for a
concrete parsing example and test oracle.
5. Relative URI References
It is often the case that a group or "tree" of documents has been
constructed to serve a common purpose; the vast majority of URI in
these documents point to resources within the tree rather than
RFC 2396 URI Generic Syntax August 1998