One does not simply go into on-demand revalidation at scale! Well, at least not without learning a few tips and tricks first.
#The gist
On-demand revalidation is the holy grail of page caching, promising great performance and data freshness: any given page is built only when needed, and then cached for exactly as long as its content stays valid. This technique avoids constantly rebuilding pages, as you get with SSR and time-based ISR.
However, this is not really all-or-nothing.
You can combine these patterns in the same site, investing in on-demand revalidation where it matters most. This guide covers not only how to best implement on-demand revalidation, but also when. It has three parts:
- Overview (this part): basic requirements, typical solutions with pros/cons, and a suggested optimization.
- Database and authentication: adding a products database and basic authentication, enabling a logged-in user to update products.
- Cache tags and revalidation: adding cache tags and the code to detect changes and purge relevant tags.
Example website and demo
The guide is based on a fork of the Astro Storefront template. The source code is available on GitHub, and a live demo is available on Netlify.
#Video preview
Here’s a video that provides a preview of the guide and an overview of how it all works together.
#Essential considerations
No matter what your tech stack is, there are two basic requirements that need to be fulfilled to allow for on-demand revalidation:
- You need a mechanism to detect data changes and to know which entities changed.
- When rendering a page, you need to know the entities included in the page.
By knowing which entities are included in a page, you can set the appropriate cache tags in the response headers.
Then when a change occurs, you’ll be able call Netlify’s Purge API for the relevant cache tags, and any cached content marked with any of these tags would go up in smoke, like magic! Netlify takes care of purging the right pages for you based on the tags.
#Not all changes are worth revalidation
There are many kinds of changes that may occur in an e-commerce website, for example:
- Updating product prices or descriptions
- Adding or removing products
- Adding, updating, or removing collections of products
These changes vary in frequency and impact. Some of these changes are much more frequent than others. For example, creating a new collection is a relatively rare occurrence. But it may impact several (or all) pages on your site — for example, the header menu may need to be updated to include the new collection.
Infrequent and/or complex changes are often best handled by purging the full cache for the site. It may even necessitate a fresh build and deploy anyway. It is the more frequent, granular changes that are good candidates for targeted on-demand revalidation.
#Detecting changes in data sources
Data source systems (databases, CMSs, e-commerce backends) typically support subscribing to webhooks and/or polling for updates. Let’s take a look at each of these approaches.
#Subscribing to webhooks
Subscribing to webhooks is a relatively simple and near-real-time approach, though not all data sources support it (CMSs and e-commerce systems usually do, databases usually don’t). Netlify and similar platforms make it simple to add an API endpoint to your site to start receiving updates.
However, if you miss an update, or fail to process it for any reason (site was down, faulty webhook logic, etc.), your site may be out of sync and you may not even know which pages are affected!
#Mitigating webhook weaknesses
There are various ways to mitigate these weaknesses. Improving logging, error handling and monitoring is a very good start.
But you can also simplify the webhook logic by having it delegate all the actual work to an asynchronous work queue on your end. This makes it easier to retry processing failures, monitor, pause and resume, etc. Netlify provides Async Workloads for such purposes.
#Polling for updates
Polling for updates is a more traditional approach, which works well when you don’t have a webhook mechanism available. In this approach, you periodically query the source system(s) for any changes that occurred since the previous successful poll.
This requires you to store the timestamp of the previous poll, which requires a reliable store (not the local filesystem) to save this state in.
In case of some temporary errors on either your side or in the source system, your periodic queries would keep asking for changes from the time of the last success. In production, you can add monitoring to check the timestamp of the last successful query, and ensure it is recent enough.
Most source systems, including e-commerce backends and CMSs, have a form of query language that you can use for such a query, since these are all essentially databases.
There are two things to note, though:
- To detect changes quickly after they occurred, you need to set a short time interval between polls. This may incur costs on your source data system, which may enforce a quota on API calls, or charge by the meter.
- Assuming you need to know of deleted entities as well, make sure the system can provide this information as well via its query engine. Otherwise, you’d only know of creations and updates.
#Hybrid detection approach
The example site adopts a hybrid approach, meant to capture changes both robustly and in real-time.
#Example detection logic
The logic for detecting changes is based on querying the product catalog (stored in the database) for any changes since the last query. This is regardless of the source of the changes - whether they were done via the UI or any other backend/external source.
Since we’re already using a database (Turso) for the catalog, we’ve also added a table for storing the state of any periodically-running tasks, including state, time and custom data payloads. This is a convenient basis for changing the revalidation logic and also for further enhancements.
As for detecting deleted entities, we’ve adopted the useful common pattern of soft deletes. Meaning, a deleted entity remains in the database, but has its deletedAt field set to the time of deletion. This is useful for many purposes, including being able to tell which entities were deleted when.
#Example triggers
The above logic can be triggered in multiple ways:
- Right after a change was made by a authorized user in the UI.
- Via a webhook, if the source system supports it.
- Via an API call. In the example site, a scheduled function is invoked every 15 minutes and triggers the polling logic via the API.
In this setup, scheduled polling acts as a fallback, handling any case where change detection was not otherwise triggered. Therefore, it does not need to run every few seconds.
No matter what triggered the change detection logic, though, the logic remains the same. Over the next two parts, you’ll learn in detail how this is all implemented.
#Adding cache tags
You also need to make sure that your pages have all the right cache tags in their response headers, so that when your code detects changes and calls the purge API for a tag (or multiple tags), all the relevant pages would be invalidated.
While seemingly simple, this can get a bit tricky, since web pages often include data from multiple entities.
For example, a PDP (product detail page) typically includes not just data about a specific product ID, but also one or more additional widgets showing related products.
This is no problem for widgets whose content is loaded asynchronously (as is often the case with product recommendations and similar features), since the cached markup of the page itself does not contain the products in the widget, only the client-side code needed to load them.
(Asynchronous loading has its own known cons, though Astro’s Server Islands nicely mitigate some of them.)
However, for any content that’s rendered in the cached page itself, you’d need to make sure that all the appropriate cache tags are included in the page response headers. In some cases, this can be pretty straightforward. In others, it would require some work.
#Case in point: composable websites
For example, composable websites often have a page structure that is not fully fixed, but rather defined on a per-page level in the CMS. The code that renders the page is in such cases structured as a top-down recursive process that resolves the appropriate component at each level, based on the content types in the CMS data. You may call that content-driven architecture.
This architecture allows marketers and other content editors the freedom to add, remove, and nest various components, limited only by the schema rules defined in the CMS. This is exactly the pattern that Netlify’s visual editor supports.
This means that the full list of entities a given page should render is often known only after pulling the page data from CMS. Then, to find all relevant cache tags, you may need to add some logic to iterate down the nested content object and collect all relevant entity IDs.
At this point, and especially if your code utilizes this architectural pattern, creating a robust solution might seem daunting. Finding the right balance is key.
#Finding the right balance
On-demand revalidation can indeed be considered an advanced pattern for non-trivial websites. Cache invalidation isn’t known to be easy.
However, you may have already established that neither fully-static nor fully-server-rendered sites fit your needs. For an effective solution, you need to find the right balance — don’t overdo it, and have fallbacks in place.
#Don’t over-engineer your solution
Focus on the entities and types of changes that matter the most - those that are frequent and don’t necessitate invalidating a big chunk of your site anyway.
If your strategy for cache tags seems too complex, simplify it to reduce granularity, even if that means some unnecessarily invalidated pages here and there. Opt for simplicity and clarity over complexity and fragility.
#Have fallbacks in place
In the example site, product updates made via the UI result in immediate triggering of the revalidation logic.
However, there is also a periodic trigger that acts as a fallback for any database updates that did not result in an immediate trigger (e.g. due to an oversight made by a developer in some code path). This technique is covered in more detail in a later part of this guide.
As an additional safeguard, instead of setting a very long TTL (time-to-live, or expiry time) for pages, you can settle on a middle-ground (e.g. one hour).
In this way, you’d know that any cached page more than an hour old would be regenerated. Assuming that you also utilize stale-while-revalidate (SWR), your users won’t notice any performance hit when pages get regenerated due to expiry.
#Next steps
Now that we went over the theory, we’ll next dive into the example code. In part two, we’ll add a database and authentication, enabling a logged-in user to update products.
Next: Database and authentication