if query is defined then
append "?" to result
append query to result
if fragment is defined then
append "#" to result
append fragment to result
return result
Note that we must be careful to preserve the distinction between a
component that is undefined, meaning that its separator was not
present in the reference, and a component that is empty, meaning
that the separator was present and was immediately followed by the
next component separator or the end of the reference.
The above algorithm is intended to provide an example by which the
output of implementations can be tested -- implementation of the
algorithm itself is not required. For example, some systems may find
it more efficient to implement step 6 as a pair of segment stacks
being merged, rather than as a series of string pattern replacements.
Note: Some WWW client applications will fail to separate the
reference's query component from its path component before merging
the base and reference paths in step 6 above. This may result in
a loss of information if the query component contains the strings
"/../" or "/./".
Resolution examples are provided in Appendix C.
RFC 2396 URI Generic Syntax August 1998
6. URI Normalization and Equivalence
In many cases, different URI strings may actually identify the
identical resource. For example, the host names used in URL are
actually case insensitive, and the URL <http://www.XEROX.com> is
equivalent to <http://www.xerox.com>. In general, the rules for
equivalence and definition of a normal form, if any, are scheme
dependent. When a scheme uses elements of the common syntax, it will
also use the common syntax equivalence rules, namely that the scheme
and hostname are case insensitive and a URL with an explicit ":port",
where the port is the default for the scheme, is equivalent to one
where the port is elided.
7. Security Considerations
A URI does not in itself pose a security threat. Users should beware
that there is no general guarantee that a URL, which at one time
located a given resource, will continue to do so. Nor is there any
guarantee that a URL will not locate a different resource at some
later point in time, due to the lack of any constraint on how a given
authority apportions its namespace. Such a guarantee can only be
obtained from the person(s) controlling that namespace and the
resource in question. A specific URI scheme may include additional
semantics, such as name persistence, if those semantics are required
of all naming authorities for that scheme.
It is sometimes possible to construct a URL such that an attempt to
perform a seemingly harmless, idempotent operation, such as the
retrieval of an entity associated with the resource, will in fact
cause a possibly damaging remote operation to occur. The unsafe URL
is typically constructed by specifying a port number other than that
reserved for the network protocol in question. The client
unwittingly contacts a site that is in fact running a different
protocol. The content of the URL contains instructions that, when
interpreted according to this other protocol, cause an unexpected
operation. An example has been the use of a gopher URL to cause an
unintended or impersonating message to be sent via a SMTP server.
Caution should be used when using any URL that specifies a port
number other than the default for the protocol, especially when it is
a number within the reserved space.
Care should be taken when a URL contains escaped delimiters for a
given protocol (for example, CR and LF characters for telnet
protocols) that these are not unescaped before transmission. This
might violate the protocol, but avoids the potential for such
RFC 2396 URI Generic Syntax August 1998
characters to be used to simulate an extra operation or parameter in
that protocol, which might lead to an unexpected and possibly harmful
remote operation to be performed.
It is clearly unwise to use a URL that contains a password which is
intended to be secret. In particular, the use of a password within
=13= |