Understanding Web Caching
Quite often we see Web pages that include images and other files loading faster than we expect. If you are wondering how that happens, Web caching could be one of the ways. The following steps will help you to easily understand the concept of caching:
Clear your Internet explorer browser history and then visit any site (say www.facebook.com).
A cache is basically a collection of data that duplicates original values stored elsewhere on a computer. A cache is a memory that is stored very close to the CPU to allow faster access. Apart from Web caching, there are different types of caching as explained below:
CPU cache: a small area of fast memory used by the central processing unit
Disk buffer: the small amount of buffer memory present on a hard drive
Page cache: the cache of disk pages kept by the operating systems, stored in unused main memory
Web cache: a mechanism for the temporary storage of Web objects like Web pages, images, etc. to improve the performance
DNS cache: a server in the domain name system which stores queried results for a period of time
P2P caching: a technique used to reduce bandwidth costs for content on peer-to-peer networks
Database caching: a mechanism used to cache database content in multi-tier applications
In the below sections we shall try to understand Web caching in detail and how the major browsers deal with Web caching. Web CachingWeb caching is a mechanism for the temporary storage of Web objects, such as HTML pages, images or other files requested from the Internet. The storage of these Web objects can either be on the local machine or on some server on the Web. After an original request for data has been successfully fulfilled, and that data has been stored in the cache, further requests for those files is fulfilled by retrieving information from the cache rather than the original location.
The goal of caching is to eliminate the need to send requests in many cases, and to eliminate the need to send full responses in many other cases. The former reduces the number of network round-trips required for many operations, and the latter reduces network bandwidth-requirements. There are two types of Web caches:
Browser cache: A browser cache is part of all popular Web browsers. The browser keeps a local copy of all recently displayed pages on the user’s machine, and when the user returns to one of these pages, the local copy is reused.
Proxy cache: By contrast, a proxy cache is a shared network device that can undertake Web transactions on behalf of a client, and, like the browser, the proxy cache stores the content.
Internet browsers use caching to store HTML Web pages by storing a copy of visited pages and then using that copy to render when you re-visit that page. Let us now look at the caching mechanism in some of the commonly used Web browsers. Internet Explorer In a user’s machine, Internet Explorer stores the cached Web objects at the following location or within the folders present in it: C:Users[username]AppDataLocalMicrosoftWindowsTemporary Internet Files
Another way to reach to the above location is to navigate through the following in the Internet Explorer browser:
Tools –> Internet Options –> General –> Browsing history –> Settings –> View Files
Google Chrome One can find the cached Web objects for Chrome at the following location:
Mozilla Firefox Mozilla Firefox caches Web objects at the following location:
A shortcut from the browser to view the cache in Firefox is: Open the browser and type “about:cache” in the address bar and hit enter. This lists the memory cache and also the disk cache where the Web objects are stored.
Security Risk of Caching caching is good for performance and convenience, but there is a flip side: “security”. Web caching is a typical example of “security = 1/convenience” — that is, there is a security cost for user convenience: exposing your Web application to potential security threats. For example, since cache information can contain sensitive data, it has to be protected from unauthorized access. In the case of Web applications, you would need to avoid caching confidential information on the user’s browser in order to prevent accessing the data outside the control of the Web application. Web caching of login pages also exposes the application to specific threats, such as stealing user credentials with a Web proxy.
Remediation Application developers may prevent caching of web objects by explicitly enforcing few cache control headers with directives like no-cache and no-store, etc.
Apart from the cache control headers, caching can also be avoided by using META tags.
Preventing caching using http cache control headers
Cache control response headers can be set as follows:
For HTTP/1.1: Cache-Control: no-cache
For HTTP/1.0: Pragma: no-cache
Pragma and cache control are the implementation of the same concept but in HTTP/1.0 and HTTP/1.1 respectively.
The “no-cache” directive instructs the browser or a proxy or a gateway to submit the request to the origin server for validation before responding with a cached copy each time. On the other hand, the “no-store” directive instructs the browser or the Web proxy not to store anything on the cache.
Now, these directives will work differently for different browsers. For example, Internet Explorer does not respect the “no-cache” directive. This means that even if the “no-cache” directive is used, the page still gets cached. Therefore, in order to be safe with all the browsers and versions of HTTP, one has to enforce “no-store” and “expires” directives apart from the “no-cache” directive.
The best practice to avoid caching is to set the cache control header as follows Cache Control: no-cache, no-store
“Expires” is one of the important cache control directives, as it gives scope for a developer to decide when a cached Web object has to become stale. Upon setting this directive to “-1”, one can ensure that the browser always serves the user with a fresh response. This is because the value “-1” indicates that the cached Web object has already expired.
Preventing caching using HTML meta tags
HTML meta tags can also be used to prevent caching of Web pages. The meta tags for preventing cache can be set as follows-
<META HTTP-EQUIV=”CACHE-CONTROL” CONTENT=”NO-CACHE”>
<META HTTP-EQUIV=”PRAGMA” CONTENT=”NO-CACHE”>
<META HTTP-EQUIV=”CACHE-CONTROL” CONTENT=”NO-STORE”>
Apart from the using the above remediations, for a few Web objects it is suggested to explicitly ask the user to delete the cache. Such Web objects may include files with extensions like .pdf, .txt, .xls, .docx, etc. Let us look at how to delete the cache in various browsers.
Tools -> Internet Options -> General -> Browsing history -> Delete
Menu option -> More tools -> Clear browsing data -> Check all the boxes -> Clear browsing data
Menu -> History -> Clear recent history -> Check the options -> Clear now
A shortcut to delete the history in all the browsers is: Ctrl + Shift + Delete
Concluding this topic, we can say that although caching is a very advantageous solution to reduce the bandwidth and the load on the server, it is equally dangerous in leaking highly sensitive information in a few cases. So, it is advised to properly analyze the Web objects in your application which need to be cached and which should not be cached in order to maintain the application security and also to improve the performance.