In this third and final part, you’ll learn how cache tags are used in the improved storefront example from part two: which tags are added and how, and how revalidation works. Lastly, you’ll learn how to run the code yourself.
#The tag taxonomy
Cache tags are simple strings that you add to the Cache-Tag response header. These strings could be anything you want: a simple constant value, or a dynamic value you generate in code, e.g. product-${id}
.
For consistency, we need a common definition of what strings to use in a site, and their exact format. In fancier terms, you might call this the cache tag taxonomy.
#Reviewing page content
To figure out the tags, let’s review the content included for each type of page in the storefront website — homepage, collection pages, and product pages.
#The homepage
The homepage (source) has two content elements:
- A list of recommended products.
- Links to all product collections.
#Collection pages
Collection pages (source) show all products that are associated with this collection. This type of page also supports pagination and sorting.
#Product pages
Product pages (source) show a single product in detail - but much like the homepage, these pages also include:
- A list of related products.
- Links to all collections, same as in the homepage.
#Mapping to cache tags
Based on the above, we’ve settled on three types of tags for the example — products, collections metadata, and per-collections.
#1. Product tags
Pages that show one or more products should have a cache tag per each product ID they show. That tag should take the form pid_<product-id>
.
For the homepage, that’s a tag for each of the products in the recommendations widget. For a product page, it’s one cache tag for the product in detail, plus a tag for each product in the recommendations widget.
#2. The collections metadata tag
Pages that show links to all collections (the homepage and product pages) need to be invalidated if there was a change to the name of a collection, or a collection was added or deleted. We’ll use a single constant tag for all such cases: collections_metadata
.
Fortunately, if the contents of a collection were modified - typically a much more frequent change - pages merely showing links to collections are not affected.
Making it simpler
Changes to the list of collections are probably infrequent enough, and may impact so many pages when they do occur, that you may decide to forego this type of tag and require a new build instead.
#3. Per-collection tags
Lastly, let’s tackle the needs of collection pages.
These may need to be rebuilt if a product in that collection changes, or if a product is removed from or added to the collection.
Tracking how these changes might affect pagination and sorting for a collection can quickly get out of hand. Hence, for collection pages, we’ll add a simple cid_<collection_id>
tag, regardless of pagination, sorting, etc.
When any product in a given collection changes, or the product is moved between collections, we’d need to purge the tags of relevant collections.
#Implementing tags
Now that we know what cache tags we need where, it’s time to add these into the code.
#The helper
The module src/lib/headers.ts
(source) provides a helper function to assist in adding cache tags. Here is an abbreviated version of it:
Let’s break this down:
CacheTagOptions
is a type used by specific pages to declare what content changes should make a page’s cached version invalid.
The helper function applyCacheHeaders
receives a Headers
object and an instance of CacheTagOptions
for a page, and adds the following response headers:
"cache-control": "public,max-age=0,must-revalidate"
instructs the browser to always always validate content freshness with the CDN, since the browser doesn’t know when content is invalidated for any reason."cdn-cache-control"
: this header is used by the CDN only. The value dictates that content should be cached for up to a full day and no more. This ensures that pages would be “refreshed” daily, no matter what."cache-tag"
is set to a comma-separated list of all relevant cache tags.
Note the use of the durable
keyword to enable Durable Cache, and stale-while-revalidate
. Both provide latency optimizations for users.
#Page routes
With the helper in place, here is the code for a specific page type. Below is how cache tags are set for product pages (source):
#Viewing cache tags for a page
You can view all these response headers via the Network tab in your browser’s developer console. But as an added service, you’ll find a handy little link in the page footer which shows a popup with cache tags on hover.
Next to it, you’d also find the exact time when the page you’re seeing was generated, making it easy to detect if that page was freshly (re-)built or not.
#Implementing revalidation
Content change detection in the storefront may be triggered by various sources (after edits in the UI, or by an external service via API, etc.) - but regardless of the trigger, the logic always looks at the DB to find changes since the last time it was triggered.
Keeping track of state is a typical need for periodic background jobs, so the example codebase includes basic facilities for this.
Since we’re already using Turso, the needed code for this is pretty straightforward (see src/lib/jobs/index.ts
) and based on a simple table schema:
#The revalidation job
With all the supporting pieces now in place, here is the revalidation logic (source):
(If you already see the elephant in the code, hold on a minute, we’ll get to it.)
revalidateJob
is invoked in two places:
- When a product name is updated (see
src/lib/client.mock.ts:updateProductName()
) - Via the
/api/revalidate
endpoint implemented insrc/pages/api/revalidate.ts
. When the site is deployed to Netlify, the scheduled function is invoked periodically and calls this API endpoint (seenetlify/functions/scheduled-revalidate-check.mts
).
#The API
The API endpoint code is very straightforward. Here is the whole file:
As you can see, the endpoint also has a GET
form in which it returns the last status of the change detection code, allowing an external monitoring service to alert a developer if necessary (or you could simply call it from the browser if you’re logged in!)
#Tackling complexity
The code above is mostly straightforward, but there’s a sneaky issue in the change detection.
If any product’s collectionIds
field is modified, that’s easy to detect. But if the product was moved from collection A to B, there would be no record of collection A in the updated field value, so how would we know to invalidate the cache tag for collection A?
We’ve left it up to you to consider how it should best be handled, but did leave you with a tip on how we’d do it.
If this starts getting too tricky to reason about for your taste, please do remember that on-demand revalidation doesn’t have to be all or nothing. The largest benefits come from handling the most common types of changes (product pricing, stock levels, etc.) in an efficient way.
If you can detect such changes in a robust way, great! If not, or for the less common cases, you may opt to go for more brute-force approach, such as a fresh deploy. This boils down to a choice of where to focus your efforts.
#Run it yourself
Running the example is pretty simple, both locally and on Netlify.
#Run locally
- Ensure you have the prerequisites:
- Node 20+
- pnpm (Preferably)
- Clone the repository.
- Copy
.env.example
to.env
and set your own basic password and secret - see instructions in the file. - Run:
pnpm i
- Run:
pnpm run dev
and chill at http://localhost:4321/.
Locally, there is no CDN in the loop and hence no actual caching. However, as cache response headers are set by the website anyway, you can see these in the browser’s developer console.
To test the scheduled function:
- Ensure you have the Netlify CLI installed, and run
netlify dev
. The CLI starts a server on port 8888 which wraps around Astro’s dev server. - Navigate to http://localhost:8888/.netlify/functions/scheduled-revalidate-check and check the site’s logs!
#Set up a remote database before deploying
Before deploying to Netlify, you should set up a hosted database:
- Create an account and a database in Turso (there’s a free tier).
- For best performance, create the database on AWS in the North Virginia region (
us-east-1
), which is closest to the default Netlify region for functions located in Ohio (us-east-2
). If you’re a Pro+ Netlify customer, you’ll be able to later set your functions region to exactly match the functions region to your database.
- For best performance, create the database on AWS in the North Virginia region (
- Create a token for your new database, and grab the token value and database URL (the one starting with
libsql://...
). - Locally, set these values to
ASTRO_DB_APP_TOKEN
andASTRO_DB_REMOTE_URL
in your.env
file. - Locally, run
pnpm astro db push --remote
to create the database schema. - Then, run run
pnpm astro db execute db/seed.ts --remote
to fill the database with data. This takes a minute. - To verify that the remote database is all set-up before moving forward, you can have your local dev server connect to it by running
pnpm run dev --remote
.
#Deploying to Netlify
To create a new site, you can use the button above which will automatically create a copy of the example repository in a GitHub account that you choose. Alternatively, you can fork the example yourself, and then create a new site for it in Netlify UI.
You also have the option of manual deploys from the command-line, without needing your own repository at all.
Whichever method fits you, make sure to add the required environment variables (the same keys as in your .env
file) with the appropriate values.
And that’s it!