What is Caching, and Why Does it Matter?
Cache
For some reason, every time I hear the word, I think of squirrels piling up nuts for winter. It is not a good example, but it is one of the many uses of the word cache. A better way of thinking of caching website content would be a photo of a menu used to show what’s available at a restaurant. We all understand that it was a single moment in time and that the menu has likely changed. Sometimes we check the date of the picture to see how likely.
This is the most important fact to get out of a discussion of caching. The content in a cache was 100% accurate at one moment in the past. But the age of that cache can make it obsolete.
Where is Caching Used
Everywhere.
In fact, I don’t know any web interaction that does not leverage the caching behavior. But crucial to our discussion, we will touch on these three areas:
- CMS
- Host
- Browser
Why Cache
So why do we use caching across the World Wide Web? Speed is the first and most important. Every site wants to produce a response in the shortest possible time. A close second is limiting resource consumption. This can directly affect the cost of hosting a website. Finally, caching can help limit attacks on a server. Imagine a tweet attracts thousands of new visitors to your site. Without caching, it is easy to find your server overwhelmed by the crush of new visitors. In general, there are multiple layers of caching going on with every website you interact with. Each attempting to make your site faster, leaner and stronger.
In the world of website construction, you will often hear the word Cache thrown around in many ways. Often, content is hidden due to a problem with “cache invalidation”. Another common discussion is to clear your “browser cache”. This is often complicated by the “frontside caching” provided by CDN’s such as Fastly or Cloudflare. As a developer, I often resolve problems with “cache tags” or “cache timing”.
CMS
Typically, a website is built on top of a Content Management System (CMS). Examples would be: Wordpress, Drupal, Squarespace, Wix, Shopify, to name a few. Each request to a CMS requires the server to spend valuable time building out the content for the page. For our conversation lets just focus on the homepage of a business. It works through many lines of code to determine what is “current information” to be on the homepage. When this work is done, it sends the document to your browser. The browser then renders the documents with all the images, colours, and styles. Over the next 24 hours, many people will ask for this exact same content. Without caching, the server will have to spend valuable time resulting in the exact same content.
So in the world of web development, this represents a perfectly good example of content that can be cached for a time period. This is the soul of cached content. To make this work, we need a few key elements:
- Resource Identifier - This is the information in the address bar of a page
- Max-Age - This is the time limit the CMS thinks the cache data will be good to use. Think of it as an expiration date.
- Age - This is how old the cache data is.
This is a simplified list of elements, but they represent the core of caching requirements. With all of these elements, we can now begin to build a caching layer to help our website’s performance.
How It Works
So with these few elements, we can start caching requests. Let’s say the max-age is a day and the current age is 1 hour. When a new request for the homepage is made, the server looks up the resource, finds that the current cache is well within the max-age, and then returns the content without any further calculations.
This prevents server processing and speeds content delivery. A big win for the site owner, website host and the visitor. Seems like this is an obvious improvement in everyone’s experience. What could go wrong?
Caching Problems
So in our last example lets pretend that there is a section of the homepage that has a list of the three most recent blog stories. Very important for a website to promote its new content, and helpful to the visitor to know of the recent news. Unfortunately, the server just cached a new version of the font page and will not make another for 24 hours. During this time, the blog will not be promoted, and no one will know it exists. Far from ideal.
There are lots of solutions to this problem, but I want to stop here and tell you that every solution is a partial solution. There are many steps in the world wide web were caching occurs. The server we are discussing is only one of them.
Services such as Fastly, Cloudflare, and Amazon Cloudfront are Content Delivery Networks(CDN) that many companies hire to help them with load balancing and front-side caching. This can cache data for long periods of time and cause problems with clearing caches.
Lastly, the most likely problem with caching is the browser a visitor uses. Each of them try very hard to limit the bandwidth used and speed user experience. To do this, all of them will cache content you have already seen. If you visit a site within the max-age of the page, it is very likely the browser will display the same content you saw previously without ever asking the server for the content.
So What Do I Need to Know About Caching
The first is to be aware that it exists. The vast majority of the time, it will work in the background flawlessly, and you’ll never have a problem.
If you do feel that the content is "stale", you can perform a hard refresh on the browser. This will force it to clear its cache and ask again for the current content. Every browser is different, but a common command is Ctrl-Shift-R or Cmd-Shift-R. Beyond that effort, there is little you can do to force the host or CMS to clear their cache.
If you have more questions about caching, feel free to reach out to our team. We're happy to help (in real time!)
Article written by David Smallwood, Senior Developer