| LOCAL |/--------HITS-------| SIBLING |
| CACHE |\------RESOLVED-----| CACHE |
| | | |
+------------------+ +------------------+
| | | | |
| | | | |
| | | | |
V V V V V
===================
CACHE CLIENTS
FIGURE 1: A Simple Web cache hierarchy. The local cache can retrieve
hits from sibling caches, hits and misses from parent caches, and
some requests directly from origin servers.
to provide "transit" for the request if necessary, and accordingly
parent caches are ideally located within or on the way to a transit
Internet service provider (ISP).
Squid and Harvest allow for complex hierarchical configurations. For
example, one could specify that a given neighbor be used for only a
certain class of requests, such as URLs from a specific DNS domain.
RFC 2187 ICP September 1997
Additionally, it is possible to treat a neighbor as a sibling for
some requests and as a parent for others.
The cache hierarchy model described here includes a number of
features to prevent top-level caches from becoming choke points. One
is the ability to restrict parents as just described previously (by
domains). Another optimization is that the cache only forwards
cachable requests to its neighbors. A large class of Web requests
are inherently uncachable, including: requests requiring certain
types of authentication, session-encrypted data, highly personalized
responses, and certain types of database queries. Lower level caches
should handle these requests directly rather than burdening parent
caches.
3. What is the Added Value of ICP?
Although it is possible to maintain cache hierarchies without using
ICP, the lack of ICP or something similar prohibits the existence of
sibling meta-communicative relationships, i.e., mechanisms to query
nearby caches about a given document.
One concern over the use of ICP is the additional delay that an ICP
query/reply exchange contributes to an HTTP transaction. However, if
the ICP query can locate the object in a nearby neighbor cache, then
the ICP delay may be more than offset by the faster delivery of the
data from the neighbor. In order to minimize ICP delays, the caches
(as well as the protocol itself) are designed to return ICP requests
quickly. Indeed, the application does minimal processing of the ICP
request, most ICP-related delay is due to transmission on the
network.
ICP also serves to provide an indication of neighbor reachability.
If ICP replies from a neighbor fail to arrive, then either the
network path is congested (or down), or the cache application is not
running on the ICP-queried neighbor machine. In either case, the
cache should not use this neighbor at this time. Additionally,
because an idle cache can turn around the replies faster than a busy
one, all other things being equal, ICP provides some form of load
balancing.
4. Example Configuration of ICP Hierarchy
Configuring caches within a hierarchy requires establishing peering
relationships, which currently involves manual configuration at both
peering endpoints. One cache must indicate that the other is a
parent or sibling. The other cache will most likely have to add the
first cache to its access control lists.
RFC 2187 ICP September 1997
Below we show some sample configuration lines for a hypothetical
situation. We have two caches, one operated by an ISP, and another
operated by a customer. First we describe how the customer would
configure his cache to peer with the ISP. Second, we describe how
the ISP would allow the customer access to its cache.
4.1. Configuring the `proxy.customer.org' cache
In Squid, to configure parents and siblings in a hierarchy, a
`cache_host' directive is entered into the configuration file. The
format is:
cache_host hostname type http-port icp-port [options]
=3= |