Back to blog

April 1, 2026

How to maintain product demos at scale

walkthroughsdemo-managementoperationspillar

The combinatorial problem nobody warns you about

A Series-B SaaS company hires its thirtieth account executive in Q3. The product team ships into eight verticals — fintech, healthcare, logistics, e-commerce, manufacturing, agencies, education, and the catch-all "platform." There are five flagship features, and each rep is expected to demo whichever combination matches the prospect on the call.

Thirty reps times eight verticals times five features is twelve hundred plausible demo permutations. Even after you collapse the matrix — most fintech reps never demo the manufacturing flow, most logistics demos skip the agency reporting view — you are still left with somewhere between one hundred and two hundred demos that need to exist, be discoverable, and be correct on any given Tuesday morning.

This is the scale problem. It is not a tooling problem. It is an operating problem that tooling either helps or hides.

The folk wisdom in most go-to-market orgs is that demos are evergreen content: record once, link from the wiki, move on. That belief survives until roughly the twentieth demo. Past twenty, the rot sets in faster than any single owner can patch it. A pricing page redesign breaks fourteen recordings overnight. An auth provider changes its consent screen and now every onboarding demo flashes a missing logo. A product manager renames "Workspaces" to "Projects" and three years of narration is suddenly wrong.

The teams that thrive at this scale do not have better recording tools. They have a demo program — a registry, an ownership model, a healing strategy, a governance review, and a small set of metrics they actually look at. The tool is the substrate. The program is what keeps the substrate from rotting.

This piece is for the operator who has just been handed the demo problem — the CS lead, the product marketer, the founding sales engineer, the founder herself — and has been told to "make it work." It is tool-agnostic. The principles apply whether you are running Arcade, Supademo, Storylane, Navattic, Tourial, raw Loom files in a Notion page, or something you built in-house. The point is the discipline, not the vendor.

Why demos rot

Before you can design a maintenance program, you have to be honest about what you are maintaining against. Demos rot for at least seven distinct reasons, and most teams only plan for two of them.

UI shipping velocity

The most obvious cause. Modern product teams ship continuously — sometimes dozens of merges per day to the front end. Any of those merges can move a button, rename a field, change a route, or introduce a new modal. The demo recorded last Wednesday referenced a "Save and continue" button that is now just "Continue" with a keyboard shortcut hint. Multiply by every demo in your library and you have a steady-state breakage rate that scales linearly with engineering throughput.

A/B tests and feature flags

Less obvious, more pernicious. A demo recorded against the variant-B onboarding flow will look wrong to half your prospects, and completely wrong to anyone who hits the variant-C flow that shipped two weeks later. Demos recorded behind unreleased feature flags will appear to show capabilities the prospect cannot actually buy. Demos recorded for a flag that has since been rolled back will reference a feature that no longer exists.

Tenant-specific UIs

If your product has white-labeling, role-based UI variation, or workspace-level configuration, every demo is implicitly recorded against one tenant configuration. The demo recorded in the marketing team's sandbox will not match what an enterprise prospect sees in their custom-branded environment. This is the silent killer of "show, don't tell" sales motions in B2B.

Copy changes

Product marketing rewrites the empty-state text. Legal updates the pricing footer. The CEO decides "Customers" should be "Members" everywhere. None of these touch the underlying code paths a demo recorder cares about, but every one of them makes existing demo narration look like it was written for a different product.

Design refreshes

The big one. Once every twelve to eighteen months, every B2B SaaS does a design refresh — new color palette, new type scale, new component library. Every demo in your library breaks visually on the same day. There is no automated healing strategy that survives a design refresh; this is a planned re-record event, and budgeting for it is part of being honest about the maintenance load.

Dependency UIs

Your demo of the Stripe Checkout flow breaks the day Stripe ships its 2026 redesign. Your OAuth-with-Google demo breaks when Google changes its consent screen layout. Your Slack-integration demo breaks when Slack's app directory changes its install button copy. You do not control these surfaces, you cannot get advance notice, and yet your prospects see the breakage as your product looking broken.

Data drift

The seed data in the demo account ages out. Dates that were "recent" when recorded are now eighteen months stale. The synthetic customer named "Acme Corp" is sitting next to a real customer named "Acme Inc." that signed up last quarter. Charts that showed beautiful upward trajectories now end in a flat line because no one has been generating activity in the demo tenant.

A serious demo program plans for all seven. A naive one budgets for the first and is surprised by the other six.

The demo library: structure and taxonomy

A demo library is a real artifact, not a folder. It has a schema, an owner, a review cadence, and a sunset policy. Here is how to structure one that survives growth.

Three axes of organization

Every demo lives at the intersection of three axes. Pick all three explicitly and store them as structured metadata, not as words in a title.

  • Audience. Who is on the receiving end? Prospect (top of funnel), prospect (technical evaluation), customer (onboarding), customer (expansion), internal (enablement), partner (co-sell). These have wildly different tolerances for length, polish, and depth.
  • Surface. Where does the demo live? Marketing site, sales email, in-app empty state, help center article, partner portal, conference booth kiosk. The surface determines the SLA — a demo on the homepage is a P1 if it breaks; a demo buried in an internal Notion is a P3.
  • Topic. What does the demo show? A specific feature, a vertical-specific workflow, a competitive-displacement narrative, a quickstart, a deep-dive integration setup.

A demo titled "Salesforce integration walkthrough" tells you nothing useful. A demo tagged `audience:prospect-technical surface:docs topic:integration/salesforce` is queryable, sortable, and ownable.

Naming conventions

Pick one and enforce it ruthlessly. A workable convention:

`{topic}--{audience}--{surface}--v{n}`

So: `salesforce-integration--prospect-technical--docs--v3`. Ugly, but unambiguous, and grep-able when something breaks. Human-readable titles live in a separate `display_title` field.

Versioning

Treat every recorded demo like a software artifact. Major version bumps for re-records against significant UI changes. Minor versions for narration edits, copy fixes, redactions. Keep at least the previous major version archived but not deleted — when a customer references a demo from six months ago in a support ticket, you want to be able to reproduce what they saw.

The registry as a real artifact

The registry is the index of every demo your org has ever shipped, with metadata. At minimum, every entry has:

FieldPurpose
`id`Stable identifier independent of title
`title`Human-readable name
`audience`, `surface`, `topic`The three axes
`owner`Single named human, never a team alias
`last_recorded_at`When the underlying capture was last refreshed
`last_verified_at`When a human last confirmed it still works
`health_status`Green / yellow / red
`surfaces_embedded_on`URLs where this demo is currently live
`sunset_date`When this demo expires absent re-verification

Build it in whatever tool fits your stack — a Notion database, an Airtable, a small internal app, a spreadsheet that nobody is allowed to edit without a PR. The substrate matters less than the discipline of keeping it complete.

Ownership and accountability

The single most common failure mode in demo programs is collective ownership. "The CS team owns onboarding demos" means nobody owns onboarding demos. When the Stripe redesign breaks the billing demo at 9pm on a Sunday, "the CS team" does not get paged.

Every demo has exactly one owner

Not a team. Not a Slack channel. A named human, with a backup. The owner is responsible for: re-recording when the underlying flow changes, approving copy edits, signing off on the quarterly review, and being the escalation point for healing alerts.

The owner does not need to be the person who records the demo. They need to be the person whose name is in the registry and who answers when the demo breaks.

The CSM-versus-PMM split

In most B2B orgs, two functions have legitimate claims on the demo library, and they will fight unless you draw the line clearly.

  • Product marketing owns demos whose primary job is positioning: top-of-funnel marketing site demos, sales pitch flows, competitive-displacement narratives, launch-week feature highlights. The KPI is conversion influence.
  • Customer success owns demos whose primary job is enablement: onboarding flows, feature-adoption walkthroughs, in-app help, expansion plays. The KPI is time-to-value or feature activation.

Sales engineering and solutions consulting often own a third tier — bespoke, account-specific demos that get re-recorded for a single deal. These should not live in the same registry as the evergreen library; they have a different lifecycle and a different sunset policy (typically: deleted after the deal closes or dies).

The demo on-call rotation

Once you have automated healing (more on this in the next section), you have alerts. Alerts need a destination. The mature pattern is a weekly rotation, one person per week, paged by the healing system when a demo regresses past a confidence threshold. The on-call person is not expected to fix every break — they are expected to triage: route to the owner, schedule a re-record, or acknowledge that the break is acceptable until the next planned refresh.

Sunset criteria

Every demo should have a sunset trigger. Common ones: feature is deprecated; audience segment is no longer pursued; demo has had zero embed views in ninety days; the underlying surface no longer exists. Without a sunset policy, libraries grow monotonically and reviewers spend their time on demos no one watches.

Treat the demo library like a product. It has stakeholders, a roadmap, and a backlog. Review it quarterly. Cut what is not earning its keep.

Healing strategy

This is where most programs either scale or collapse. A library of two hundred demos cannot be maintained by manual re-recording — the math does not work. You need a tiered strategy.

The healing spectrum

Think of healing as a spectrum from cheapest-and-most-manual to most-expensive-and-most-automated.

  1. On-break re-record. The demo breaks, someone notices (usually a prospect or a rep), and the owner re-records it. Cheapest in steady-state, catastrophic when the break is silent and the demo is on a high-traffic surface. Acceptable for low-stakes internal demos. Unacceptable for anything customer-facing.
  2. Scheduled human review. Every demo is reviewed by a human on a fixed cadence — say, every six weeks. The reviewer steps through it, confirms it works, re-records if needed. Predictable cost. Misses breaks that occur between reviews. Fine for medium-stakes demos when the underlying product changes slowly.
  3. Scheduled smoke runs. A scheduled job — typically a headless browser script — replays the demo against the live product and flags differences. Catches breaks within hours instead of weeks. Requires an investment in the smoke-run infrastructure. The right default for most demos in a serious program.
  4. Automated healing. When a smoke run detects a break, an agent attempts to patch the demo automatically — finding the renamed selector, the moved button, the new modal — and either applies the patch or stages it for human review. The expensive end of the spectrum, and the only one that actually scales past a few hundred demos.

Most programs need a mix. A reasonable default: on-break for internal demos, scheduled smoke runs for the broad library, automated healing for the top decile by traffic and revenue impact.

SLA targets

Tier your SLAs by surface:

  • P1: homepage, pricing page, top-of-funnel ads. Detection within one hour. Resolution within four hours. Auto-heal preferred over taking the demo down, but taking it down is preferable to showing a broken one.
  • P2: docs, in-app, sales-shared links. Detection within twenty-four hours. Resolution within three business days.
  • P3: internal enablement, archived materials. Detection within a week. Resolution as part of the next quarterly review.

Write the SLAs down. Without them, every break feels equally urgent, which means none of them are.

Failure modes of automated healing

Automated healing is not free, and the failure modes are subtle.

  • False positives. The healing agent decides a demo is broken when it actually is not — say, the timestamp on a chart looks different because the demo replayed on a Tuesday. Noisy alerts train people to ignore alerts.
  • Semantic drift. The agent successfully patches the click target but misses that the underlying meaning changed. A button was renamed from "Publish" to "Submit for review" — the click still works, the narration is now wrong, the prospect is confused. Pure DOM-level healing cannot catch this. You need a layer of semantic verification, ideally a model-driven diff against the demo's intended message.
  • Healing the wrong thing. The agent silently patches a demo to use a feature flag the prospect does not have access to. Now your demo shows a fictional product. The fix: any auto-patch that touches business logic, not just selector hygiene, requires human approval before going live.
  • Compounding patches. Patch on patch on patch eventually produces a demo that bears no resemblance to what was originally recorded. Set a re-record trigger when cumulative patches exceed a threshold — say, three patches or twenty percent of steps modified.

We built our own healing agent — the Playwright-based engine that powers Heal Demo — because when we surveyed the market in 2024, none of the off-the-shelf walkthrough tools treated healing as a first-class concern. Whatever you use, evaluate the healing layer with the same rigor you would evaluate a CI system. It is the load-bearing piece.

Governance and quality

A demo library is a published artifact. It has the same governance needs as your marketing site or your help center.

Approval workflow

Every demo, before it goes live on a customer-facing surface, passes through three reviewers:

  • The owner — confirms accuracy.
  • A peer reviewer — confirms clarity. Catches the curse of knowledge: steps that are obvious to the recorder but confusing to anyone else.
  • A brand or content reviewer — confirms tone, length, visual consistency, and that no one accidentally narrated "click the stupid little gear icon."

This is not bureaucracy. It is the same review you apply to a blog post or a pricing page change. Demos are marketing collateral; they should not bypass the marketing review process.

The demo style guide

Write one. It is a living document, owned by product marketing, that codifies:

  • Narration tone. Active voice. Second person. Present tense. No filler. Length cap per step.
  • Visual standards. Cursor style, callout colors, hotspot vs. tooltip rules, transition speed.
  • Length caps. Demos longer than ninety seconds lose half their viewers; demos longer than three minutes are essentially internal documents masquerading as marketing. Pick a number.
  • Redaction rules. What gets blurred, what gets replaced with synthetic data, what gets cropped out.

A team without a style guide produces a library that looks like it was made by twenty different vendors, because it effectively was.

PII and customer data

The single most expensive demo failure is leaking real customer data. The demo account named "Acme Corp - Q3 pilot" goes live on the marketing site. A real Acme employee finds it via Google. You have a breach disclosure on your hands.

Defenses, in order of cost:

  1. Use a dedicated demo tenant. Never record from a tenant that has ever held real customer data. This is non-negotiable.
  2. Synthetic data only. Names, emails, account values, charts — all generated. Maintain a "demo data" catalog of approved synthetic entities the whole team uses.
  3. Automated redaction at capture. Recording tools that auto-blur email addresses, names matching customer-CRM patterns, and credit card formats. Belt and suspenders.
  4. Quarterly redaction audit. A human reviews a sample of live demos for leaked PII. Cheap insurance.

If you cannot say with confidence that no real customer name appears in any of your live demos, you are one search query away from a bad week.

Metrics that matter

Resist the urge to instrument everything. Five metrics, looked at weekly, beat fifty looked at never.

Demo-level metrics

  • Completion rate. What percentage of viewers reach the final step. The single best signal for whether a demo is the right length and the right pace.
  • Drop-off step. Where viewers leave. If everyone bails at step four, step four is broken — confusing copy, slow load, irrelevant content. Fix it or cut it.
  • Embed views by surface. Which surfaces drive viewers. Often surprising — the demo you thought was for sales gets most of its traffic from the help center.

Library-level metrics

  • Percentage of healthy demos. Of all live demos in the registry, what fraction are passing their most recent smoke run. This is your "is the library on fire" gauge. Target ninety-five percent or higher.
  • Time-to-heal, p50 and p95. From break detection to resolution. The p95 is what tells you whether your on-call rotation is functioning.
  • Ownership coverage. Percentage of demos with a named, currently-employed owner. Should be one hundred percent. Every quarter, run the registry against your HRIS and find the orphaned demos.

That is eight numbers total. Put them on a single dashboard. Review them weekly with the demo program lead and monthly with the broader GTM team. Resist anyone who asks you to add a ninth before the first eight have been stable for a quarter.

A 90-day rollout for teams just starting

If you are reading this with a sinking feeling because your library is currently a Notion page of Loom links, here is the order of operations.

Days 1-14: inventory

List every demo, everywhere. Marketing site, help center, sales emails (search the CRM for `arcade.software`, `supademo.com`, `loom.com`, your own embeds), in-app, partner portals, internal wikis. Do not judge. Do not delete. Just list. Expect to find at least 1.5x more demos than anyone thought existed. The output is a flat spreadsheet.

Days 15-30: ownership and triage

For each demo, assign a single owner. If no owner can be found, the demo is a candidate for sunset — flag it. Then triage: which demos are on customer-facing surfaces (P1/P2), which are internal (P3). Most programs find that twenty percent of demos drive eighty percent of the views; ruthlessly focus there first.

Days 31-50: registry

Stand up the structured registry — schema as outlined above. Migrate the spreadsheet into it. This is also when you write the style guide, even if it is just one page, because you are about to need it for the next phase.

Days 51-70: healing

Pick the right tier of healing for each demo. Set up scheduled smoke runs for everything P1/P2. Configure automated healing for the top decile. Stand up the on-call rotation, even if it is just two people trading weeks. Define the SLAs and write them in the registry.

Days 71-90: metrics and review cadence

Wire up the eight metrics. Run the first quarterly review. Sunset everything that failed the ownership audit and has zero recent views. Communicate the new program to the GTM org with a single page describing how to request a new demo, who owns what, and what the SLAs are.

Ninety days in, you will not be done — you will never be done. But you will have moved from a folder of rotting Looms to an actual program with a pulse.

Closing: it is a discipline, not a tool

Return to the Series-B with thirty reps and twelve hundred theoretical demo permutations. The teams that handle this scale do not have a magic recorder. They have an operating discipline: a registry, single-named owners, tiered healing, written SLAs, a style guide, and eight metrics on a dashboard.

The honest truth, after a few years of building in this space, is that no tool solves this on day one. Vendors who claim otherwise are selling you the recorder and pretending the program runs itself. It does not. The recorder is the easy part — anyone can capture a flow. The hard part is the registry that does not rot, the ownership that does not drift, the healing that catches breaks before your prospects do, and the governance that keeps real customer names off your homepage.

If you are evaluating tools, evaluate the healing layer hardest. That is where the leverage is at scale, and that is the bet behind Heal Demo — that the program runs better when the substrate handles the boring repair work, so the humans can focus on narrative, positioning, and the demos worth recording fresh.

But pick whatever tool fits. The discipline travels. The library you build today, if you build it on the principles above, will survive a vendor migration, a design refresh, a series of A/B tests, and a thirtieth account executive who joined last week and is about to demo a vertical you have not yet named. That is the bar. That is what maintaining product demos at scale actually looks like.