The existing (HTTP/1.0) "cache-busting" mechanisms for counting
distinct users will certainly overestimate the number of users behind
a proxy, since it provides no reliable way to distinguish between a
user's initial request and subsequent repeat requests that might have
been conditional GETs, had not cache-busting been employed. The
RFC 2227 Hit-Metering and Usage-Limiting October 1997
"Cache-control: s-maxage=0" feature of HTTP/1.1 does allow the
separation of use-counts and reuse-counts, provided that no HTTP/1.0
proxy caches intervene.
Note that if there is doubt about the validity of the results of
hit-metering a given set of resources, the server can employ cache-
busting techniques for short periods, to establish a baseline for
validating the hit-metering results. Various approaches to this
problem are discussed in a paper by James Pitkow [9].
4.2 What about "Network Computers"?
The analysis in section 4.1 assumed that "almost all Web users" have
client caches. If the Network Computers (NC) model becomes popular,
however, then this assumption may be faulty: most proposed NCs have
no disk storage, and relatively little RAM. Many Personal Digital
Assistants (PDAs), which sometimes have network access, have similar
constraints. Such client systems may do little or no caching of HTTP
responses. This means that a single user might well generate many
unconditional GETs that yield the same response from a proxy cache.
First note that the hit-metering design in this document, even with
such clients, provides an approximation no worse than available with
unmodified HTTP/1.1: the counts that a proxy would return to an
origin server would represent exactly the number of requests that the
proxy would forward to the server, if the server simply specifies
"Cache-control: s-maxage=0".
However, it may be possible to improve the accuracy of these hit-
counts by use of some heuristics at the proxy. For example, the
proxy might note the IP address of the client, and count only one GET
per client address per response. This is not perfect: for example,
it fails to distinguish between NCs and certain other kinds of hosts.
The proxy might also use the heuristic that only those clients that
never send a conditional GET should be treated this way, although we
are not at all certain that NCs will never send conditional GETs.
Since the solution to this problem appears to require heuristics
based on the actual behavior of NCs (or perhaps a new HTTP protocol
feature that allows unambiguous detection of cacheless clients), it
appears to be premature to specify a solution.
4.3 Critical-path delay analysis
In systems (such as the Web) where latency is at issue, there is
usually a tree of steps which depend on one another, in such a way
that the final result cannot be accomplished until all of its
predecessors have been. Since the tree structure admits some
RFC 2227 Hit-Metering and Usage-Limiting October 1997
parallelism, it is not necessary to add up the timings for each step
to discover the latency for the entire process. But any single path
through this dependency tree cannot be parallelized, and the longest
such path is the one whose length (in units of seconds) determines
the overall latency. This is the "critical path", because no matter
how much shorter one makes any other path, that cannot change the
overall latency for the final result.
If one views the final result, for a Web request, as rendering a page
at a browser, or otherwise acting on the result of a request, clearly
some network round trips (e.g., exchanging TCP SYN packets if the
connection doesn't already exist) are on the critical path. This
hit-metering design does add some round-trips for reporting non-zero
counts when a cache entry is removed, but, by design, these are off
any critical path: they may be done in parallel with any other
operation, and require only "best efforts", so a proxy does not have
to serialize other operations with their success or failure.
Clearly, anything that changes network utilization (either increasing
or decreasing it) can indirectly affect user-perceived latency. Our
expectation is that hit-metering, on average, will reduce loading and
so even its indirect effects should not add network round-trips in
any critical path. But there might be a few specific instances where
the added non-critical-path operations (specifically, usage reports
upon cache-entry removal) delay an operation on a critical path.
This is an unavoidable problem in datagram networks.
5 Specification
5.1 Specification of Meter header and directives
=11= |