Consider a URN namespace based on MIME Content-Ids. The URN might
look like this:
urn:cid:199606121851.1@mordred.gatech.edu
(Note that this example is chosen for pedagogical purposes, and does
not conform to the recently-approved CID URL scheme.)
The first step in the resolution process is to find out about the CID
namespace. The namespace identifier, cid, is extracted from the URN,
prepended to urn.net, and the NAPTR for cid.urn.net looked up. It
might return records of the form:
cid.urn.net
;; order pref flags service regexp replacement
IN NAPTR 100 10 "" "" "/urn:cid:.+@([^\.]+\.)(.*)$/\2/i" .
We have only one NAPTR response, so ordering the responses is not a
problem. The replacement field is empty, so we check the regexp
field and use the pattern provided there. We apply that regexp to the
entire URN to see if it matches, which it does. The \2 part of the
substitution expression returns the string "gatech.edu". Since the
flags field does not contain "s" or "a", the lookup is not terminal
and our next probe to DNS is for more NAPTR records:
lookup(query=NAPTR, "gatech.edu").
Note that the rule does not extract the full domain name from the
CID, instead it assumes the CID comes from a host and extracts its
domain. While all hosts, such as mordred, could have their very own
NAPTR, maintaining those records for all the machines at a site as
large as Georgia Tech would be an intolerable burden. Wildcards are
not appropriate here since they only return results when there is no
exactly matching names already in the system.
The record returned from the query on "gatech.edu" might look like:
gatech.edu IN NAPTR
;; order pref flags service regexp replacement
IN NAPTR 100 50 "s" "z3950+N2L+N2C" "" z3950.tcp.gatech.edu
IN NAPTR 100 50 "s" "rcds+N2C" "" rcds.udp.gatech.edu
IN NAPTR 100 50 "s" "http+N2L+N2C+N2R" "" http.tcp.gatech.edu
RFC 2168 Resolution of URIs Using the DNS June 1997
Continuing with our example, we note that the values of the order and
preference fields are equal in all records, so the client is free to
pick any record. The flags field tells us that these are the last
NAPTR patterns we should see, and after the rewrite (a simple
replacement in this case) we should look up SRV records to get
information on the hosts that can provide the necessary service.
Assuming we prefer the Z39.50 protocol, our lookup might return:
;; Pref Weight Port Target
z3950.tcp.gatech.edu IN SRV 0 0 1000 z3950.gatech.edu
IN SRV 0 0 1000 z3950.cc.gatech.edu
IN SRV 0 0 1000 z3950.uga.edu
telling us three hosts that could actually do the resolution, and
giving us the port we should use to talk to their Z39.50 server.
Recall that the regular expression used \2 to extract a domain name
from the CID, and \. for matching the literal '.' characters
seperating the domain name components. Since '\' is the escape
character, literal occurances of a backslash must be escaped by
another backslash. For the case of the cid.urn.net record above, the
regular expression entered into the zone file should be
"/urn:cid:.+@([^\\.]+\\.)(.*)$/\\2/i". When the client code actually
receives the record, the pattern will have been converted to
"/urn:cid:.+@([^.]+\.)(.*)$/\2/i".
Example 3
---------
Even if URN systems were in place now, there would still be a
tremendous number of URLs. It should be possible to develop a URN
resolution system that can also provide location independence for
those URLs. This is related to the requirement in [1] to be able to
grandfather in names from other naming systems, such as ISO Formal
Public Identifiers, Library of Congress Call Numbers, ISBNs, ISSNs,
etc.
The NAPTR RR could also be used for URLs that have already been
assigned. Assume we have the URL for a very popular piece of
software that the publisher wishes to mirror at multiple sites around
the world:
http://www.foo.com/software/latest-beta.exe
=5= |