December 5, 2024

Modern e-commerce caching with Astro + Turso: Using cache tags (part 3)

by Elad Rosenheim

In this third and final part, you’ll learn how cache tags are used in the improved storefront example from part two: which tags are added and how, and how revalidation works. Lastly, you’ll learn how to run the code yourself.

#The tag taxonomy

Cache tags are simple strings that you add to the Cache-Tag response header. These strings could be anything you want: a simple constant value, or a dynamic value you generate in code, e.g. product-${id}.

For consistency, we need a common definition of what strings to use in a site, and their exact format. In fancier terms, you might call this the cache tag taxonomy.

#Reviewing page content

To figure out the tags, let’s review the content included for each type of page in the storefront website — homepage, collection pages, and product pages.

#The homepage

The homepage (source) has two content elements:

A list of recommended products.
Links to all product collections.

#Collection pages

Collection pages (source) show all products that are associated with this collection. This type of page also supports pagination and sorting.

#Product pages

Product pages (source) show a single product in detail - but much like the homepage, these pages also include:

A list of related products.
Links to all collections, same as in the homepage.

#Mapping to cache tags

Based on the above, we’ve settled on three types of tags for the example — products, collections metadata, and per-collections.

#1. Product tags

Pages that show one or more products should have a cache tag per each product ID they show. That tag should take the form pid_<product-id> .

For the homepage, that’s a tag for each of the products in the recommendations widget. For a product page, it’s one cache tag for the product in detail, plus a tag for each product in the recommendations widget.

#2. The collections metadata tag

Pages that show links to all collections (the homepage and product pages) need to be invalidated if there was a change to the name of a collection, or a collection was added or deleted. We’ll use a single constant tag for all such cases: collections_metadata.

Fortunately, if the contents of a collection were modified - typically a much more frequent change - pages merely showing links to collections are not affected.

Making it simpler

Changes to the list of collections are probably infrequent enough, and may impact so many pages when they do occur, that you may decide to forego this type of tag and require a new build instead.

#3. Per-collection tags

Lastly, let’s tackle the needs of collection pages.

These may need to be rebuilt if a product in that collection changes, or if a product is removed from or added to the collection.

Tracking how these changes might affect pagination and sorting for a collection can quickly get out of hand. Hence, for collection pages, we’ll add a simple cid_<collection_id> tag, regardless of pagination, sorting, etc.

When any product in a given collection changes, or the product is moved between collections, we’d need to purge the tags of relevant collections.

#Implementing tags

Now that we know what cache tags we need where, it’s time to add these into the code.

#The helper

The module src/lib/headers.ts (source) provides a helper function to assist in adding cache tags. Here is an abbreviated version of it:

import { ONE_HOUR, ONE_DAY } from "../config.ts";

export type CacheTagOptions = {
  productIds?: string[];
  collectionIds?: string[];
  collectionsMetadataWasModified?: boolean;
};

export function applyCacheHeaders(headers: Headers, options?: { cacheTags?: CacheTagOptions }) {
  // cache-control is what the client browser cares about
  const cacheHeaders: Record<string, string> = {
    "cache-control": "public,max-age=0,must-revalidate",
  };

  // At the CDN level, configure a 1-day cache,
  // plus up to one extra hour where serving stale content is allowed
  // (this triggers a regeneration in the background).
  cacheHeaders["cdn-cache-control"] = `public,durable,s-maxage=${ONE_DAY},stale-while-revalidate=${ONE_HOUR}`;

  // Add cache tags for the CDN, if any
  if (options?.cacheTags) {
    const tagsHeaderValue = CacheTags.toHeaderValue(options.cacheTags);
    if (tagsHeaderValue) {
      cacheHeaders["cache-tag"] = tagsHeaderValue;
    }
  }

  for (const [key, value] of Object.entries(cacheHeaders)) {
    headers.append(key, value);
  }
}

Let’s break this down:

CacheTagOptions is a type used by specific pages to declare what content changes should make a page’s cached version invalid.

The helper function applyCacheHeaders receives a Headers object and an instance of CacheTagOptions for a page, and adds the following response headers:

"cache-control": "public,max-age=0,must-revalidate" instructs the browser to always always validate content freshness with the CDN, since the browser doesn’t know when content is invalidated for any reason.
"cdn-cache-control": this header is used by the CDN only. The value dictates that content should be cached for up to a full day and no more. This ensures that pages would be “refreshed” daily, no matter what.
"cache-tag" is set to a comma-separated list of all relevant cache tags.

Note the use of the durable keyword to enable Durable Cache, and stale-while-revalidate. Both provide latency optimizations for users.

#Page routes

With the helper in place, here is the code for a specific page type. Below is how cache tags are set for product pages (source):

---
// ...imports

const id = Astro.params.product;

// ...further logic

const recommendedProducts = await getRecommendedProducts({
  collectionId: product.collectionIds?.[0],
  excludeIds: [id],
});

const cacheTags = {
  productIds: [id, ...recommendedProducts.map((p) => p.id)],
  collectionsMetadataWasModified: true,
};
applyCacheHeaders(Astro.response.headers, { cacheTags });
---

// ...component markup

#Viewing cache tags for a page

You can view all these response headers via the Network tab in your browser’s developer console. But as an added service, you’ll find a handy little link in the page footer which shows a popup with cache tags on hover.

Popup with cache tag names in the page footer

Next to it, you’d also find the exact time when the page you’re seeing was generated, making it easy to detect if that page was freshly (re-)built or not.

#Implementing revalidation

Content change detection in the storefront may be triggered by various sources (after edits in the UI, or by an external service via API, etc.) - but regardless of the trigger, the logic always looks at the DB to find changes since the last time it was triggered.

Keeping track of state is a typical need for periodic background jobs, so the example codebase includes basic facilities for this.

Since we’re already using Turso, the needed code for this is pretty straightforward (see src/lib/jobs/index.ts) and based on a simple table schema:

// ...
const JobsTable = defineTable({
  columns: {
    name: column.text({ primaryKey: true }),
    lastSuccess: column.json({ optional: true }),
    lastFailure: column.json({ optional: true }),
  },
});

#The revalidation job

With all the supporting pieces now in place, here is the revalidation logic (source):

// Note: error handling code and such mostly removed for brevity

import { CollectionsTable, ProductsTable, db, gt, or } from "astro:db";
import { CacheTags } from "~/lib/headers.ts";
import { purgeCache } from "@netlify/functions";
import { REVALIDATE_JOB } from "~/config.ts";
import { getJobStatus, saveJobStatus } from "./index.ts";

async function getModifiedTags(sinceDate: Date) {
  const modifiedCollections = await db
    .select({ id: CollectionsTable.id })
    .from(CollectionsTable)
    .where(
      or(
        gt(CollectionsTable.createdAt, sinceDate),
        gt(CollectionsTable.updatedAt, sinceDate),
        gt(CollectionsTable.deletedAt, sinceDate)
      )
    );
  const collectionsMetadataWasModified = modifiedCollections.length > 0;

  const modifiedProducts = await db
    .select()
    .from(ProductsTable)
    .where(
      or(
        gt(ProductsTable.createdAt, sinceDate),
        gt(ProductsTable.updatedAt, sinceDate),
        gt(ProductsTable.deletedAt, sinceDate)
      )
    );

  // Collections having either their metadata changed or associated products
  // changed should be invalidated
  const affectedCollectionIds = new Set<string>(modifiedCollections.map((c) => c.id));
  modifiedProducts.forEach((p) => {
    (p.collectionIds as string[]).forEach((collectionId) => affectedCollectionIds.add(collectionId));
  });

  // Surprise! leaving this for the reader:
  if (modifiedProducts.length > 0) {
    /* TODO:
    Figure out which products have moved between collections (if any), to update
    not just their current collections but also their previous ones.
    To do this, calculate all collection->product IDs lists,
    and compare to prev. stored calculation (fyi: as optimization,
    it's enough to store the *hash* of all product IDs per collection)
    */
  }

  return CacheTags.toValues({
    productIds: modifiedProducts.map((p) => p.id),
    collectionIds: [...affectedCollectionIds.values()],
    collectionsMetadataWasModified,
  });
}

export const revalidateJob = async () => {
  const now = new Date();
  try {
    const lastJobStatus = await getJobStatus(REVALIDATE_JOB);

    const tags = await getModifiedTags(lastJobStatus.lastSuccess.date);
    if (tags.length > 0) {
      await purgeCache({ tags });
    }

    await saveJobStatus(REVALIDATE_JOB, { date: now, info: { tags } });
  } catch (e) {
    let message = e instanceof Error ? e.message : "unknown error";
    await saveJobStatus(REVALIDATE_JOB, { error: true, date: now, info: { message } });
  }
};

(If you already see the elephant in the code, hold on a minute, we’ll get to it.)

revalidateJob is invoked in two places:

When a product name is updated (see src/lib/client.mock.ts:updateProductName())
Via the /api/revalidate endpoint implemented in src/pages/api/revalidate.ts. When the site is deployed to Netlify, the scheduled function is invoked periodically and calls this API endpoint (see netlify/functions/scheduled-revalidate-check.mts).

#The API

The API endpoint code is very straightforward. Here is the whole file:

import type { APIRoute, APIContext } from "astro";
import { REVALIDATE_JOB } from "~/config.ts";
import { verifyAPICall } from "~/features/cart/auth.server.ts";
import { getJobStatus } from "~/lib/jobs/index.ts";
import { revalidateJob } from "~/lib/jobs/revalidate.ts";

export const GET: APIRoute = async (context: APIContext) => {
  if (!verifyAPICall(context)) return new Response("Not authorized", { status: 403 });

  const lastJobStatus = await getJobStatus(REVALIDATE_JOB);
  return Response.json(lastJobStatus);
};

export const POST: APIRoute = async (context: APIContext) => {
  if (!verifyAPICall(context)) return new Response("Not authorized", { status: 403 });

  return await revalidateJob({ trigger: "api" });
};

As you can see, the endpoint also has a GET form in which it returns the last status of the change detection code, allowing an external monitoring service to alert a developer if necessary (or you could simply call it from the browser if you’re logged in!)

#Tackling complexity

The code above is mostly straightforward, but there’s a sneaky issue in the change detection.

If any product’s collectionIds field is modified, that’s easy to detect. But if the product was moved from collection A to B, there would be no record of collection A in the updated field value, so how would we know to invalidate the cache tag for collection A?

We’ve left it up to you to consider how it should best be handled, but did leave you with a tip on how we’d do it.

If this starts getting too tricky to reason about for your taste, please do remember that on-demand revalidation doesn’t have to be all or nothing. The largest benefits come from handling the most common types of changes (product pricing, stock levels, etc.) in an efficient way.

If you can detect such changes in a robust way, great! If not, or for the less common cases, you may opt to go for more brute-force approach, such as a fresh deploy. This boils down to a choice of where to focus your efforts.

#Run it yourself

Running the example is pretty simple, both locally and on Netlify.

#Run locally

Ensure you have the prerequisites:
- Node 20+
- pnpm (Preferably)
Clone the repository.
Copy .env.example to .env and set your own basic password and secret - see instructions in the file.
Run: pnpm i
Run: pnpm run dev and chill at http://localhost:4321/.

Locally, there is no CDN in the loop and hence no actual caching. However, as cache response headers are set by the website anyway, you can see these in the browser’s developer console.

To test the scheduled function:

Ensure you have the Netlify CLI installed, and run netlify dev. The CLI starts a server on port 8888 which wraps around Astro’s dev server.
Navigate to http://localhost:8888/.netlify/functions/scheduled-revalidate-check and check the site’s logs!

#Set up a remote database before deploying

Before deploying to Netlify, you should set up a hosted database:

Create an account and a database in Turso (there’s a free tier).
- For best performance, create the database on AWS in the North Virginia region (us-east-1), which is closest to the default Netlify region for functions located in Ohio (us-east-2). If you’re a Pro+ Netlify customer, you’ll be able to later set your functions region to exactly match the functions region to your database.
Create a token for your new database, and grab the token value and database URL (the one starting with libsql://...).
Locally, set these values to ASTRO_DB_APP_TOKEN and ASTRO_DB_REMOTE_URL in your .env file.
Locally, run pnpm astro db push --remote to create the database schema.
Then, run run pnpm astro db execute db/seed.ts --remote to fill the database with data. This takes a minute.
To verify that the remote database is all set-up before moving forward, you can have your local dev server connect to it by running pnpm run dev --remote.

#Deploying to Netlify

To create a new site, you can use the button above which will automatically create a copy of the example repository in a GitHub account that you choose. Alternatively, you can fork the example yourself, and then create a new site for it in Netlify UI.

You also have the option of manual deploys from the command-line, without needing your own repository at all.

Whichever method fits you, make sure to add the required environment variables (the same keys as in your .env file) with the appropriate values.

And that’s it!

More guides

December 3, 2024

Modern e-commerce caching with Astro + Turso: Overview (part 1)

by Elad Rosenheim

Dig into a modern approach to caching for e-commerce sites using the Astro Storefront template.

December 4, 2024

Modern e-commerce caching with Astro + Turso: Database + authentication (part 2)

by Elad Rosenheim

Add a Turso database and authentication to the Astro Storefront template, building on the overview from part one.

Astro Turso Caching

April 25, 2024

How to do ISR and advanced caching with Astro

by Matt Kane

Incremental Static Regeneration (ISR) is a powerful pattern for rendering pages on the web. Astro has useful tools to do fine-grained ISR and other advanced caching patterns when deployed to Netlify. This guide will show you how to do it.

Astro Caching Primitives

Did you find this guide useful?

Your feedback helps us improve our guides.

What else would you like to learn? You can suggest a guide idea here.