listed in the order in which they would be applied by the parser.
2.4.1. Parsing the Fragment Identifier
If the parse string contains a crosshatch "#" character, then the
substring after the first (left-most) crosshatch "#" and up to the
end of the parse string is the identifier. If the
crosshatch is the last character, or no crosshatch is present, then
the fragment identifier is empty. The matched substring, including
the crosshatch character, is removed from the parse string before
continuing.
Note that the fragment identifier is not considered part of the URL.
However, since it is often attached to the URL, parsers must be able
to recognize and set aside fragment identifiers as part of the
process.
2.4.2. Parsing the Scheme
If the parse string contains a colon ":" after the first character
and before any characters not allowed as part of a scheme name (i.e.,
any not an alphanumeric, plus "+", period ".", or hyphen "-"), the
of the URL is the substring of characters up to but not
including the first colon. These characters and the colon are then
removed from the parse string before continuing.
2.4.3. Parsing the Network Location/Login
If the parse string begins with a double-slash "//", then the
substring of characters after the double-slash and up to, but not
including, the next slash "/" character is the network location/login
() of the URL. If no trailing slash "/" is present, the
entire remaining parse string is assigned to . The double-
slash and are removed from the parse string before
RFC 1808 Relative Uniform Resource Locators June 1995
continuing.
2.4.4. Parsing the Query Information
If the parse string contains a question mark "?" character, then the
substring after the first (left-most) question mark "?" and up to the
end of the parse string is the information. If the question
mark is the last character, or no question mark is present, then the
query information is empty. The matched substring, including the
question mark character, is removed from the parse string before
continuing.
2.4.5. Parsing the Parameters
If the parse string contains a semicolon ";" character, then the
substring after the first (left-most) semicolon ";" and up to the end
of the parse string is the parameters (). If the semicolon
is the last character, or no semicolon is present, then is
empty. The matched substring, including the semicolon character, is
removed from the parse string before continuing.
2.4.6. Parsing the Path
After the above steps, all that is left of the parse string is the
URL and the slash "/" that may precede it. Even though the
initial slash is not part of the URL path, the parser must remember
whether or not it was present so that later processes can
differentiate between relative and absolute paths. Often this is
done by simply storing the preceding slash along with the path.
3. Establishing a Base URL
The term "relative URL" implies that there exists some absolute "base
URL" against which the relative reference is applied. Indeed, the
base URL is necessary to define the semantics of any embedded
relative URLs; without it, a relative reference is meaningless. In
order for relative URLs to be usable within a document, the base URL
of that document must be known to the parser.
RFC 1808 Relative Uniform Resource Locators June 1995
The base URL of a document can be established in one of four ways,
listed below in order of precedence. The order of precedence can be
=4= |