Docs/Designing a good test
#

Designing a good test

export const meta = { title: 'Designing a good test', description: 'Learn how to structure Canary workflows so tests are readable, reliable, and easy to debug.', tags: ['guide', 'testing', 'workflows'], };

A Canary test is a workflow: a sequence of nodes like login, navigate, action, assertion, wait, and setup that models a real user journey. The goal is to capture intent, not keystrokes, so the run is readable and failures are actionable.

01Principles

  • Every test should prepare its own data.
  • Tests are a semantic layer: give context and intent, not UI trivia.
  • Actions should be meaningful (e.g. “Create a ticket”), not per-click.
  • Don’t overstuff actions; big blobs make failures hard to pinpoint.
  • Choose assertion severity intentionally so failures reflect business impact.
  • Use warnings to surface useful signals without hiding whether the core journey succeeded.

021) Make each test self-sufficient

Tests should not depend on what another test created or on long-lived shared state. If a workflow needs data, create it inside the test:

  • Use a setup node to reference a published flow when shared data build-up is required for multiple workflows.
  • Otherwise, add setup actions at the start of the workflow itself.
  • Prefer unique identifiers (timestamps, short random suffixes) so reruns don’t collide.

This keeps runs deterministic and prevents one failure from cascading into the next.

032) Write steps as human intent

A workflow should read like the steps you would explain to a teammate:

  • “Create a ticket assigned to Support.”
  • “Submit the ticket and confirm it appears in the list.”
  • “Change the ticket status to Closed.”

Use the workflow Description field to explain the overall goal of the test before you write individual steps. Describe the user journey, why it matters, and what success looks like. Because Canary now uses the workflow description to guide execution, write it as clear test intent, not as a label or internal note.

A strong workflow description answers three questions:

  • What user outcome are you testing?
  • What context or scenario matters for this run?
  • What result should Canary treat as success?

For example, prefer “Verify a Support admin can create and assign a ticket so urgent requests enter the Support queue with the correct owner” over “Ticket flow” or “Regression test.”

Use action descriptions and custom instructions to capture the “why” and the “what” for each step. Then add assertions that prove the user-visible result.

When you write assertions, decide whether each result should stop the run or add extra signal. Keep the main business outcome as a failure-level check, and move secondary observations into warning-level assertions. This makes each step read like intent while still showing which results are critical versus informational.

When the journey should prove responsive behavior, add an explicit viewport resize step at the point where the user experience changes. Use resize steps to model real checkpoints such as switching from desktop to tablet before opening navigation, or shrinking to mobile before confirming a drawer, menu, or stacked layout still works.

Do not add resize steps just to increase coverage. If the workflow outcome is the same across sizes and the layout change is not meaningful to the user task, keep the workflow focused on one viewport and test a different breakpoint in a separate workflow or scenario.

043) Choose the right granularity

Good actions are user-meaningful, but still tight enough to diagnose failures.

Too granular

  • “Click New”
  • “Type title”
  • “Type description”
  • “Click Save”

Too stuffed

  • “Create a ticket, assign it, add a comment, upload a file, and verify it appears in three views”

Good

  • “Create a ticket with required fields”
  • “Assign the ticket to Support”
  • “Upload the customer attachment”
  • “Verify the ticket appears in the queue”

Rule of thumb: if an action crosses multiple screens or changes multiple entities, split it. If it’s a single user goal on a single screen, keep it as one action.

054) Keep actions readable and diagnosable

Overstuffed actions hide where failures happen. Aim for steps that can clearly answer:

  • What was the user trying to do?
  • Where in the UI was it done?
  • What should the user see afterward?

The Flow Designer nodes map directly to how your test reads. When in doubt, favor clarity over compactness.

Pair each important action with an assertion that is easy to scan in run details. Use Fail for checks that determine whether the workflow outcome is trustworthy, and use Warn for checks that add context without invalidating the run. Because warnings now appear separately from true failures in run and step details, you can safely use Warn for reviewable conditions without making it look like execution broke.

If a workflow has meaningful variations, use scenarios instead of forcing one path to cover every case. Define a small set of representative scenarios when different inputs, accounts, or credentials change the user outcome you need to validate. This gives you broader coverage while keeping each workflow readable.

Do not create a scenario for every tiny UI variation. Add scenarios when they represent distinct business conditions such as a different user role, account state, region, or data shape that can change how the workflow behaves.

06Writing reliable steps

Recorded workflows are more useful when each step describes a replayable user outcome instead of a fragile UI sequence. Canary now does a better job generating steps for complex interactions, seeded flows, and recurring failures, but you still get the best results when you review those steps and keep the intent clear.

When you see generated steps for uploads, downloads, dismissing overlays, keyboard-driven actions, page verification, or viewport changes, keep them focused on what the user is accomplishing:

  • Write file interactions as outcomes, such as “Upload the invoice PDF” or “Download the exported report.”
  • Write overlay interactions with the goal first, such as “Dismiss the cookie banner” or “Close the onboarding modal.”
  • Write keyboard-driven actions as the user task, such as “Submit the form with the keyboard” or “Move focus to the search field.”
  • Write page checks as visible outcomes, such as “Verify the dashboard loads” or “Confirm the results page shows the filtered list.”
  • Write resize steps around the responsive checkpoint, such as “Resize to mobile width” before “Open the navigation menu.”

Avoid preserving incidental mechanics unless they are the behavior you actually need to test. For example, “Press Tab three times and hit Enter” is weaker than “Move focus to Search and submit the query with the keyboard” because the second version stays valid if the page layout changes.

For responsive coverage, treat each resize as a checkpoint with a follow-up assertion. After resizing, confirm the user can still complete the next meaningful task or see the expected responsive state. Prefer “Resize to tablet width and verify filters move into the drawer” over “Resize browser window,” because the first statement explains what changed and why it matters.

07Handling complex interactions

Some recorded actions need extra review because they involve browser prompts, transient UI, or interactions that are easy to describe poorly. Canary can now generate these steps more accurately, but you should still make sure each one is scoped to a single outcome.

Use the following guidance when you refine complex steps:

InteractionWrite the step aroundAvoid
UploadsThe file or document the user addsRaw click sequences on file controls
DownloadsThe export or asset the user retrievesVague steps like “Click download” without the artifact
Overlays and popupsThe reason for dismissing or closing themTreating a transient banner as the main outcome
Keyboard interactionsThe user task completed with the keyboardCounting keystrokes when focus order may change
Page verificationWhat the user should be able to see or do once the page is readyGeneric checks like “Page loaded correctly”
Viewport resizeThe responsive checkpoint you need to validate nextResizing without checking what changed

If an overlay blocks the flow, handle it as its own action before the blocked task. If a keyboard shortcut is the feature under test, call that out directly in the step and pair it with an assertion that proves the shortcut worked.

For uploads and downloads, include enough detail that someone reading the run can tell what file action mattered. Name the document type, export, or asset when possible instead of using generic wording like “Upload file” or “Download document.”

For page verification, assert on user-visible state after navigation or submission. Prefer “Project details page shows the new project name” over “Wait for page to load,” because the first statement confirms the destination and the outcome.

For resize steps, choose only the breakpoints that matter to the user journey. You do not need to test every screen size in one workflow. Pick the viewport where the layout or interaction model changes, then verify the task that depends on that change.

08Reviewing generated workflows

After you import or record a workflow, review the generated steps before you rely on the test. Improved generation helps, but it does not replace test design.

Check each generated step for the following:

  1. The step describes a user goal, not just a browser event.
  2. Complex interactions are isolated instead of buried inside a large action.
  3. File actions, overlays, keyboard interactions, and resize steps use clear language about the expected result.
  4. Page verification steps confirm visible state, not only that navigation happened.
  5. Assertions appear immediately after the action they validate.
  6. Seeded setup data matches the scenario the workflow is supposed to exercise.

When a generated workflow combines too much into one action, split it. For example, separate “Upload the CSV” from “Verify the import summary appears,” even if the recorder originally captured them together.

When a generated workflow includes extra cleanup clicks, duplicate dismissals, generic page checks, or unnecessary resize events, remove or rewrite them. Keep only the steps that a user outcome depends on.

09Writing assertions

Use assertions to confirm the outcome that matters to the user, not every intermediate UI detail. A good assertion tells you what should be true after an action, whether a miss should block the run, and what condition the run output should make obvious.

Use the severity picker on assertion nodes to choose how Canary treats a failed check.

Assertion severity picker

SeverityUse it whenResult
FailThe workflow cannot be trusted if this check fails.The run fails.
WarnThe check adds useful signal, but the main user journey can still complete.The run continues and the result is recorded as a warning.

Choose Fail for checks tied to the primary outcome of the test. Examples include confirming an order is submitted, a record is created, a payment succeeds, or a status changes in a way the workflow depends on.

Choose Warn for secondary checks that help you spot regressions without blocking the run. Examples include optional badges, helpful copy, non-critical layout details, or additional confirmations that do not change whether the core flow succeeded.

Use Warn when a condition needs human review, not automatic test failure. If the workflow still completes and the result is still useful, record the condition as a warning so the run clearly shows “needs review” instead of “broken.”

If you are unsure which severity to use, ask one question: “If this condition is false, should someone trust this run?” If the answer is no, use Fail. If the answer is yes, but the result is still worth reviewing, use Warn.

Use audio assertions when sound is part of the user outcome. They work well for workflows that play prompts, voice responses, alerts, or other generated audio where the run should confirm what a user hears, not just what appears on screen.

Choose the simplest audio check that matches the risk:

  • Use measurable checks such as duration or silence levels when you need to confirm audio exists and plays as expected.
  • Use a natural-language prompt when you need to confirm the spoken content or overall meaning.
  • Keep the assertion focused on the user-visible outcome, such as “Agent greeting is spoken” or “Playback includes the account balance message.”

Avoid relying on audio assertions for incidental sounds that do not matter to the workflow outcome. Use them for primary or high-value signals so they improve coverage without adding noise.

10Making failures actionable

Write assertion text so the result is immediately understandable in run details. Canary now shows clearer assertion failures with the actual condition text, so write conditions that read well on their own.

  • Prefer “Ticket appears in the Support queue” over “Queue assertion.”
  • Prefer “Warning banner is shown after save” over “Check banner.”
  • Prefer “Status changes to Assigned” over “Verify update.”
  • Split unrelated checks into separate assertions instead of combining them.

Keep each condition specific and user-visible. Avoid vague wording like “works,” “loads correctly,” or “looks good,” because those phrases do not help someone debug the failure.

Match the severity to the response you want from the team:

  • Use Fail when someone should stop and investigate before trusting the run.
  • Use Warn when someone should review the result, but the workflow still provides useful coverage.

This makes run details easier to scan because you can quickly see which checks passed, warned, or failed, and what condition each assertion was evaluating.

11Best practices

  • Write the workflow Description as a short statement of purpose, scenario, and expected outcome.
  • Start the workflow description with the core user goal, then mention the key context that affects the run.
  • Keep the workflow description specific enough that someone can predict what the workflow should prove before reading the steps.
  • Avoid vague descriptions like “Smoke test,” “Checkout flow,” or “Admin regression.”
  • Update the workflow description when the purpose of the test changes, not just the steps.
  • Keep one assertion per expected outcome when possible.
  • Put blocking assertions immediately after the action they validate.
  • Use warning-level assertions for non-critical signals that still matter over time.
  • Use warning-level assertions for reviewable conditions, such as optional UI text or secondary confirmations, when the main path still succeeds.
  • Do not turn core business outcomes into warnings just to reduce red runs.
  • Write assertion conditions as complete, specific outcomes so the run output is clear without extra context.
  • Avoid combining multiple expectations into one condition.
  • Use scenarios for meaningful business variations, such as different roles, plans, or data states, instead of building one long workflow with branches for every case.
  • Keep scenario coverage representative. Add enough scenarios to cover distinct outcomes, but avoid duplicate variations that only make suites slower or harder to review.
  • Add viewport resize steps only when the user journey depends on a responsive layout change, navigation pattern, or breakpoint-specific behavior.
  • After each resize step, assert the responsive state that matters before continuing the flow.
  • Prefer one or two meaningful responsive checkpoints over resizing through every possible screen size.
  • Use audio assertions only when hearing the right output matters to the user journey.
  • Prefer simple audio checks first, then use prompt-based audio validation when you need to confirm what was said.
  • Separate workflow preconditions from user actions so the flow reads like setup first, then execution.
  • Use setup data that matches the scenario under test instead of relying on leftover records or broad shared state.
  • When you use a published setup flow, keep its output stable and intentional so generated workflows start from the right seeded context.
  • When you import a workflow, review the generated steps and workflow description so preconditions stay in setup, actionable steps stay in the main flow, and the overall goal is still clear.
  • Rewrite generated upload, download, overlay, keyboard, page verification, and resize steps when they describe mechanics instead of outcomes.
  • Split complex recorded actions so each step represents one user-visible result.
  • Remove duplicate dismissals, extra cleanup clicks, generic “page loaded” steps, and unneeded resize steps that do not improve coverage.
  • If a warning keeps firing and requires action every time, promote it to Fail or remove it.
  • If a failed assertion does not help someone decide what to do next, rewrite it.

12Example flow outline

  1. Login as Support admin
  2. Run setup flow for seeded ticket and account data
  3. Navigate to Tickets
  4. Create a ticket with required fields
  5. Assert ticket appears in the list (Fail)
  6. Resize to mobile width
  7. Assert the ticket list switches to the mobile layout (Warn)
  8. Open the navigation menu
  9. Dismiss the confirmation modal
  10. Assign the ticket to Support
  11. Assert ticket status changes to Assigned (Fail)
  12. Assert helper confirmation copy appears (Warn)

13Next steps