Designing a good test

Write workflows that read like a person, stay reliable, and fail with clarity.

A Cofactor test is a flow: an ordered set of nodes (login, navigate, action, assertion, seed, wait) that models a real user journey. The craft is to capture intent, not keystrokes, so the run reads like a story, replays reliably, and tells you something useful when it breaks.

A Cofactor flow in the Flow Designer: Seed Data, Acting As, Navigate, and Action nodes with inline assertions

A flow reads top to bottom: a seed provisions data, "Acting as" sets the user, then navigate and action nodes carry the journey, with assertions attached inline.

01Start with one coherent journey

A flow should test one job to be done, with a handful of checks along the way. Coherent, not micro.

One user objective per flow. Aim for 3 to 8 nodes and stop when the goal is accomplished.
Many steps are fine if they belong to the same journey (create, edit, verify one entity). Unrelated jobs belong in separate flows.
The test: "Is this step naturally connected to the previous ones?" If no, you have started a new journey. Split it.

Good (coherent): sign in, create a project, rename it, confirm the new name, delete it. Bad (three journeys): create a project, manage users, generate a report.

02How a run actually works (playbooks and cache)

Understanding this is what makes every other rule in this guide click.

The first time a flow runs, the AI agent explores your app, performs each node, and writes a playbook: a human readable list of the steps it took ("click New Ticket", "fill Title and Priority", "verify the success toast"). The raw browser actions are cached alongside it.

On every later run, Cofactor replays the cached actions directly, no agent needed, and only calls the agent to heal the specific steps that changed or broke. A clean flow replays in seconds.

The Playbook view: each node's human-readable steps, with a count of how many are cached

The playbook Cofactor generated for a flow: each action expands into readable steps (click, fill form, capture, verify), and the header shows how many are cached for instant replay.

The consequence drives this whole guide:

Clear, intent focused steps produce a clean playbook, which replays reliably. Vague steps ("do the thing on the page") produce a guessy playbook whose cache breaks at the first UI change.

This is also why a constantly changing UI does not mean constant flakiness: when a step's cache misses, the agent heals just that step and updates the playbook, instead of failing the run.

03Write steps as human intent

A flow should read like instructions you would give a teammate:

"Create a ticket assigned to Support."
"Submit the ticket and confirm it appears in the list."
"Change the ticket status to Closed."

Capture the why and the what, not the UI mechanics.

Use the flow Description as test intent. Cofactor uses the description to guide execution, so write it as a clear statement of purpose, not a label. A strong description answers three questions: what user outcome are you testing, what scenario matters for this run, and what result counts as success. Prefer "Verify a Support admin can create and assign a ticket so urgent requests enter the queue with the correct owner" over "Ticket flow".

Choose the right granularity. Actions should be user meaningful but tight enough to pinpoint a failure.

Too granular	Too stuffed	Good
"Click New", then "Type title", then "Click Save"	"Create a ticket, assign it, comment, upload a file, and verify it in three views"	"Create a ticket with required fields", "Assign the ticket to Support", "Verify it appears in the queue"

Rule of thumb: if an action crosses multiple screens or changes multiple entities, split it. If it is a single user goal on a single screen, keep it as one action. Node titles should be short and action oriented ("Create a support ticket"), with the description carrying the detail the title cannot.

04Provision your own data

A flow must not depend on data another flow created, or on long lived shared state. If it needs data, it creates that data. Cofactor gives you two ways, in order of preference:

Seed flow (preferred). A seed provisions data and exports named values that your flow consumes as $var.{name}. Seeds are reusable, can run as a fast API sequence instead of a browser, and pair with a teardown that cleans up automatically.
Inline creation. Create the data in the first few steps of the flow itself. This is fine for data a single flow owns; reach for a seed once more than one flow needs the same data, or you want a faster, browser-free setup.

A seed provisions data and exports it as $var; the flow consumes that variable; the paired teardown uses the same variable to clean up, automatically.

Use macros for every name, email, and id so parallel runs and reruns never collide. Macros generate fresh values each run:

$EMAIL() unique test email
$UUID() globally unique id
$RANDOM_STRING(6) short unique suffix
$DATE(7) a date offset (here, next week) so tests stay valid over time
$SEQUENCE("orders") per run auto increment

Follow the naming convention "QA Test $RANDOM_STRING(6)". The "QA Test" prefix makes test data instantly recognizable; the macro suffix keeps it unique.

Macro autocomplete and $var tokens in a node's Description field

$var references render as tokens inline, and typing $ opens the macro catalog ($UUID, $NOW, $TODAY, and more) with a live sample of each value.

Own your data. Clean up what you create, revert settings you change, and never touch data that was already there. It is fine to depend on pre existing reference data; just do not delete or modify what your flow does not own. Read only flows (verify a dashboard loads, check a report) need no creation or cleanup at all.

Node ordering. The conventional start is Seed → Login → Navigate. Seeds run first, in their own context, and export the variables the rest of the flow uses; putting a login before the seed wastes a browser session. For multi role journeys, add another login node (or a "Switch role" step) mid journey: each login starts a fresh session as that user.

05Verify with assertions, inline first

Assertions prove the outcome that matters to the user. Attach them to the action that produced the result.

Inline assertions are strongly preferred. They keep the test on track, produce clean failure reports tied to the exact step, and are cacheable for fast replay.
Standalone assertion nodes are the exception. Use one only when the check belongs to no action, for example confirming a page state after navigation, or that data is absent after a sequence. A typical flow has zero or one.

An action's inline Assertions panel with a Fail-severity check

Inline assertions live on the action node: conditions verified right after it completes, each with its own severity. The helper text notes they also guide the agent on the expected outcome.

Choose severity intentionally. The severity picker decides how Cofactor treats a failed check.

Severity	Use it when	Result
Fatal	Nothing after this is worth running if it fails (login failed, the page never loaded, a required record was not created).	The test stops immediately.
Fail	The check proves an outcome that matters, but later steps still give useful signal.	The test keeps running, accumulates the failure, and is marked failed at the end.
Warn	The check adds signal but the core journey still completed (a toast, optional copy, non-critical layout).	The test continues; the result is recorded as a warning, not a failure.

The deciding question is whether a miss makes the run untrustworthy. If it does, it is a failure: use Fatal when nothing after it is worth running, and Fail when later steps still give useful signal so the run should finish and report everything it can. If a miss is worth a look but the journey still succeeded, use Warn. Do not turn core outcomes into warnings just to keep runs green, and if a warning fires every run and always needs action, promote it. Every flow needs at least one failing-severity (Fatal or Fail) assertion.

Write conditions that are specific and observable. Prefer "The status badge shows Active" or "A row matching $subject is visible" over "the page looks correct". Place assertions where claims are made: after creation (the record exists), after modification (the change persisted), after deletion (the item is gone, a toast is not enough), and at the goal. Do not assert everything; focus on the claims that matter.

Agentic and audio assertions cover what cannot be checked mechanically. Use an agentic (agent evaluated) assertion for visual or layout correctness, charts rendering, or content that changes between runs; these need the agent at replay and are not cached, so reach for them only when the check is genuinely subjective or dynamic. Use audio assertions when sound is part of the outcome (a spoken greeting, an alert), picking the simplest check that matches the risk.

06Make failures actionable

Write assertion text so the result is understandable on its own in run details, because Cofactor shows the actual condition text.

Prefer "Ticket appears in the Support queue" over "Queue assertion".
Prefer "Status changes to Assigned" over "Verify update".
Split unrelated checks into separate assertions instead of combining them.
Avoid vague wording like "works", "loads correctly", or "looks good".

In run details, warnings read as their own status, distinct from failures: the warned step is logged and the run keeps going, while the failed step is what marks the run failed.

07Review generated and recorded steps

When you record or import a flow, Cofactor generates the steps. Review them before you rely on the test: keep each one focused on the user outcome, not the mechanics, especially for the interactions that are easy to describe poorly.

Interaction	Write the step around	Avoid
Uploads	The file the user adds ("Upload the invoice PDF")	Raw clicks on file controls
Downloads	The asset the user retrieves ("Download the exported report")	"Click download" with no artifact
Overlays and popups	The reason for closing it ("Dismiss the cookie banner")	Treating a transient banner as the outcome
Keyboard actions	The task done ("Submit the form with the keyboard")	Counting keystrokes when focus order may change
Page verification	What the user can see or do once ready ("Dashboard shows the filtered list")	"Page loaded correctly"
Viewport resize	The responsive checkpoint to validate next ("Resize to mobile, then open the nav")	Resizing without checking what changed

A quick review checklist for generated steps:

Each step describes a user goal, not just a browser event.
Complex interactions are isolated, not buried in one large action.
Page checks confirm visible state, not just that navigation happened.
Assertions sit immediately after the action they validate.
Seeded data matches the scenario the flow exercises.

Split actions that combine too much (separate "Upload the CSV" from "Verify the import summary appears"), and delete duplicate dismissals, generic "page loaded" checks, and resize steps that do not change the user outcome. Add a resize step only when the journey depends on a responsive change, and follow each with an assertion on the state that matters.

08Use scenarios for meaningful variations

When one journey has distinct business conditions (a different role, plan, region, or data shape that changes the outcome), use scenarios instead of forcing one path to cover every case, or branching inside a single flow. Keep scenarios representative: enough to cover distinct outcomes, not a variant for every tiny UI difference.

09Confirm reliability before you rely on it

One green run only means passing, not trustworthy. Before a flow earns a place in a suite, make sure it:

Passes consistently across several runs (its health reaches reliable).
Produces useful evidence when it fails (screenshots and traces that show what happened).
Fails for real regressions, not incidental UI churn.

When a flow flakes, fix the flow first. A suite that flakes erodes trust fast. When a run goes red, work through Triaging a failure to tell a real bug from a flake, agent confusion, or an environment problem.

10Example flow outline

Seed: Create Test Project   (exports $var.project_id, $var.project_name)
Login as Support admin
Navigate to /projects/$var.project_id
Action — Create a ticket with required fields
  - assert (fatal): the ticket appears in the list   (no point assigning a ticket that was not created)
Action — Assign the ticket to Support
  - assert (fail): status changes to `Assigned`
  - assert (warn): helper confirmation copy appears
Assert (standalone): the project header still shows $var.project_name
Teardown: delete the project   (paired with the seed, runs automatically)

11Next steps

Read Building your first suite to operationalize good tests.
Read Triaging a failure for what to do when one goes red.
See Flows, Runs, and Macros to go deeper.