Test Runs

Execute all published workflows as regression tests.

A test run executes many workflows in a single batch — all promoted workflows, a tagged subset, or a test suite — capturing pass/fail results for each. Test runs are your regression coverage: run them to confirm your application still works after a change.

A test run is built from workflow runs: drill into any row and you land on that workflow's full run detail. This page covers what's specific to running and reading a grouped run — eligibility, concurrency, suite coverage, and failure clustering. For per-run mechanics (statuses, assertions, artifacts, failure investigation), see Workflow runs.

01Starting a Test Run

Go to Runs in the sidebar
Click New Run → Test Run
Select the environment where you want to run the suite
Review the promoted workflows that are eligible for that environment
Start the run

Note: The "Test Run" option is only available if you have at least one promoted workflow.

A workflow can run only when the selected environment matches that workflow's environment eligibility. If a workflow is limited to specific environment types, Canary shows that constraint before you start the run.

When you choose an environment, Canary compares that environment with each promoted workflow's constraints. Eligible workflows run normally. Ineligible workflows stay out of execution and appear as skipped in the run summary so you can confirm why they did not run.

Paused workflows are also excluded from test runs. They stay out of regression coverage for the suite and appear separately from runnable workflows so you can tell they were intentionally left out rather than skipped by environment selection.

Test run setup showing environment eligibility for workflows

02What Happens During a Test Run

When you start a test run:

Discovery - Canary finds all promoted workflows
Eligibility check - Canary compares the selected environment with each workflow's environment constraints and marks ineligible workflows to skip
Authentication warm-up - Canary warms up any saved credentials before workflows begin so authenticated sessions are ready at the start of execution
Execution - Each eligible workflow runs concurrently in its own browser session
Verification - Canary checks expected outcomes and records assertion results, including pass, warning, and failure states
Reporting - Canary records run status for each workflow, including skipped workflows, and saves step-by-step debugging details, API call context, and run media for review

If your promoted workflows rely on saved credentials, expect the test run to prepare those sessions before the first workflow step starts. This reduces login friction at the beginning of the suite and helps login-dependent workflows begin in an authenticated state.

Test runs use the same execution engine as individual workflow runs, including automatic retries for flaky failures. In the run details view, you can inspect assertion summaries, review per-step assertion outcomes, and drill into richer debugging context for each workflow.

03Test Run Statuses

Status	Meaning
Queued	Test run is waiting to start
Running	Test run is warming up saved credentials or workflows are executing
Completed	All workflows finished without any true failures
Completed (Errors)	One or more workflows ended with a true failure

04Workflow Outcomes

Degraded workflows are intentionally excluded from execution until you re-enable them. In test run results, Canary shows them separately as Degraded or Not run instead of counting them as passed, failed, flaky, or skipped.

Each workflow in a test run has its own outcome:

Outcome	Meaning
Success	Workflow completed without any failing steps or failing assertions
Failed	Workflow ended with a true failure after all retries or hit a failing assertion
Flaky Success	Workflow passed after one or more retries
Waiting	Workflow is paused at a Wait node
Skipped	Workflow did not run because the selected environment was not eligible for that workflow
Degraded	Workflow is marked unreliable and stays out of suite execution until you re-enable it
Not run	Workflow did not execute because it is currently degraded

Warning-level assertions do not change a workflow to Failed on their own. Treat them as results that need review, not as proof that execution broke.

When you open a workflow from a test run, the detail view shows per-step assertion results so you can quickly see what passed, warned, or failed. Because test runs use the same result viewer as workflow runs, you can interpret these outcomes the same way you do in workflow run details.

See Workflow Run Details to understand how pass, warning, and failure results appear in the viewer and how to inspect the related step context.

While the test run is still Running, assertion outcomes stream into the workflow detail view as checks complete. Step logs now make assertion states easier to scan, so you can spot passing, warning, and failing checks without leaving the active run.

After the run finishes, the same workflow detail view preserves those assertion states in both the step log and replay view. This helps you confirm whether a workflow failed because of a blocking assertion or completed with warning-level results that still need review.

Paused workflows do not receive a workflow outcome in the run because they are not executed as part of the suite. Treat them as intentionally excluded from regression coverage until you promote them again.

Degraded workflows also stay out of suite execution, but they remain visible in the run summary so you can see that Canary excluded them because of reliability. Treat a degraded result as a coverage signal, not as a passing or failing outcome.

The workflow list in test run details also highlights failure patterns across runs. Use these indicators to decide whether you are looking at a new breakage or an ongoing regression:

Indicator	What it tells you	How to use it
Newly failing	The workflow failed in this test run after passing in the previous run	Prioritize this first when you want to find fresh regressions introduced by recent changes
Failure streak	The workflow has failed in consecutive test runs	Use the streak count to spot recurring problems that need deeper investigation or ownership

Use Newly failing to separate fresh breakages from known unstable areas. Use the failure streak to judge whether a workflow has been failing repeatedly and may need a broader fix, test cleanup, or triage follow-up.

05Auto-Verification and Issue Filing

When a workflow fails during a test run:

Canary analyzes the failure to determine if it is a real bug or test instability
If it is a real bug, Canary automatically creates an issue with:
- Screenshot at the moment of failure
- Steps that led to the failure
- Error message and debugging context from the run details
- Assertion results that explain what failed or warned
Issues are deduplicated, so the same failure will not create multiple issues

Use the workflow detail view in the test run to confirm whether the failure came from a step action, an assertion, or an API call before you review the filed issue.

06Viewing Test Run Results

From the Runs page:

Click the Test Runs tab
Click any test run row to see details
Review per-workflow outcomes, assertion summaries, skipped workflows, paused workflows, and status indicators
Click a workflow to inspect step details, assertion results, API calls, screenshots, and video

Use the workflow list first to separate runs that need attention from runs that actually failed. A workflow can show warning-level assertions and still finish with a successful outcome.

Test run results showing warning-level assertions separated from true failures

When the selected environment does not match a workflow's constraints, the run summary shows that workflow as Skipped instead of failed. Review skipped rows to confirm which workflows were excluded because the selected environment was not eligible.

When a workflow includes scenarios, the test run expands that workflow across every included scenario instead of running only a single variation. Review the workflow row and detail view together so you can see which scenario-specific executions were generated for the suite.

Suites only fan out across scenarios that are marked to run in suites. If a workflow has a default scenario plus additional included scenarios, Canary creates a separate execution for each included scenario and shows the scenario name with the result.

Test run results showing scenario-specific executions in a suite

When a workflow is paused, the run summary surfaces it separately from executed and skipped workflows. Review this section to understand which workflows are currently out of regression coverage and why the run may include fewer executed workflows than your total promoted workflow count.

While a test run is active, open any workflow row to monitor live assertion outcomes as they stream into the run details view. Use the assertion summary to see whether checks are passing, warning, or failing, then open the relevant step to inspect the assertion result in context.

After the test run completes, start at the run-level summary before you drill into an individual workflow. Read the counts for passed, failed, flaky, skipped, degraded, and paused workflows separately so you understand both execution quality and coverage before you triage failures.

The suite summary now gives you a richer regression overview. Read the progress state, pass rate, and workflow counts together before you investigate a single row. This helps you tell whether the suite is still in progress, whether coverage changed because of skipped, degraded, or paused workflows, and how much of the executed suite passed.

If the same workflow appears more than once in the summary, compare the workflow name, scenario label, and run status together before you decide what failed. Repeated entries can represent separate included scenarios or multiple executions that belong to the same workflow, so always open the specific row you want to inspect.

Use the pass rate as a quick health signal, not as your only triage input. Pair it with the failed count and the Newly failing markers so you can tell whether a lower pass rate comes from a fresh regression, repeated failures, or broader suite coverage changes.

Degraded workflows do not affect the suite pass rate. Canary excludes them from the pass-rate calculation and shows the degraded total explicitly in the summary so you can separate execution health from temporarily removed coverage.

Updated test run suite summary showing degraded counts alongside pass-rate reporting

Suite result counts now reflect the actual executions included in the run summary. Use the workflow list and scenario labels together when you verify totals, especially if a workflow fans out across multiple included scenarios.

When scenario fan-out is enabled for a workflow, read the scenario label before you interpret the result. A failure tied to one scenario does not mean every scenario for that workflow failed.

Paused workflows are excluded from pass-rate calculations. Use the paused count as a coverage signal, not as a passing or failing result.

Completed test runs can also include a Triage tab, where Cofactor groups related failures into clusters and highlights the most pressing one first — so when several workflows fail, you can tell whether they share a root cause before investigating each. Start with the highlighted cluster summary, then open the failed workflows grouped under it. For how clustering works, the triage states, and per-failure diagnostics, see Triage & diagnostics.

Test run triage tab showing clustered failures and summary guidance

After you review the Triage tab, use the full-screen debug view to investigate the suite before you drill into an individual workflow. The same consolidated debugging experience used for workflow runs is now available for promoted workflow test runs, so you can review the run in one place.

Use the full-screen debug view to answer these questions quickly:

If you need to know...	Check this in the debug view
Which part of the suite slowed down or stalled	Overview and Steps
Where the browser moved during execution	Navigation history
Whether cached data affected the run	Cache behavior
What Canary thinks likely caused the failure	Overview
Where to continue a deeper technical investigation	Playwright traces and Agent thoughts

Open Navigation history when you need to trace redirects, page loads, or in-app route changes across the run. This timeline makes it easier to understand whether a failure happened after the browser landed on the wrong page, bounced through a redirect chain, or never reached the expected destination.

Test run debug view showing navigation timeline details

After you identify the likely problem area, open the failed workflow to inspect the related step details, assertions, screenshots, and video. Use the full-screen debug view first for triage, then use the workflow detail view to confirm the exact failure.

After the test run completes, scan the workflow list for the new failure indicators before you drill into an individual workflow. Start with workflows marked Newly failing to catch fresh regressions quickly, then review any failure streak counts to identify workflows that have been failing across multiple runs.

Use the failure streak as a triage shortcut. A long streak usually means the issue is recurring, while a newly failing workflow often points to a recent product or test change.

After the test run completes, use the result chips and row actions to open the exact execution you want to inspect. Click the workflow row to open the suite result, then use the status cell to jump directly into that workflow run's detailed view.

After you open a run detail view, use the summary header, step log, replay, screenshots, and video together to confirm what happened. These links make it easier to move from a suite-level summary into the full execution context without losing your place.

07Review results

Use the run detail page to move from suite-level outcomes into the exact execution you want to inspect.

Open a completed test run
Review the workflow rows in the results table
Click a workflow row to open the suite result for that workflow
Click the status cell to open the related workflow run details

Status cells on completed run details link directly to the underlying workflow run. Use that link when you want the full run viewer, deeper step inspection, or a shareable execution URL.

Test run results showing status links to workflow run details

After you open the workflow run details, continue your investigation in Workflow Run Details.

08Rerun a suite

When a test run is complete, you can run the same suite again from its detail page.

Open a completed test run
Click Rerun suite
Confirm the environment and start the run
Wait for Canary to open the new test run automatically

Use this action when you want to verify a fix, confirm whether a failure is reproducible, or rerun the suite without starting from New Run.

Test run detail page with the Rerun suite action

09Navigate to workflow run details

From a completed test run, open workflow-level details directly from the results table.

Click a workflow row to review the suite result in context
Click the status cell to jump to the individual workflow run details
Use the workflow run page when you need the full execution viewer, step-by-step debugging, or a direct link to share

This shortcut helps you move from regression results to workflow-level debugging without searching for the same execution elsewhere.

After the test run completes, use the redesigned failure inspection view to review final assertion results alongside step details. The wider layout, expandable sections, and clearer expected-versus-actual context help you move from the workflow outcome to the exact step that needs review.

Use these cues when you review a workflow result:

Assertion state	What it tells you
Pass	The check succeeded at that step
Warning	The check found something to review, but the workflow still completed
Failure	The check failed and contributed to the workflow failing

Test run details use the same viewer as workflow run details, so you can use the same interpretation for pass, warning, and failure outcomes when reviewing a test run. For a complete walkthrough of the viewer, see Workflow Run Details.

When a recording is unavailable, the workflow detail view now explains why instead of leaving the replay area empty. Use this message to tell whether the run had no browser playback to show or whether Canary could not finish saving the recording.

Review these missing-recording states when you inspect a workflow result:

Missing recording state	What it means	What to do
API-only run	The workflow completed without browser playback because it only used API steps	Review API call details, step logs, and assertions instead of video
No browser interaction	The workflow opened a browser-capable run, but no browser action happened that produced a recording	Check the executed steps to confirm whether the run stayed in non-browser logic
Failed before recording started	The workflow stopped so early that Canary did not begin capturing playback	Open the first failed step, screenshot, and logs to find the blocking error
Recording upload failed	The run executed, but Canary could not finish saving the recording	Use screenshots and step details for troubleshooting, then rerun if you need playback

Test run details showing why a recording is unavailable

Treat the missing-recording message as part of the run evidence. It helps you decide whether the absence of playback is expected for that workflow or whether you should rerun the test to capture media.

When multiple workflows fail in the same run, review them in this order:

Workflows marked Newly failing
The highlighted cluster in the Triage tab, if available
Workflows with the longest failure streaks
Remaining failed workflows without either indicator

This order helps you catch recent regressions without losing sight of long-running failures that still need attention.

10Failure investigation

Open the failed workflow from the test run after you review the full-screen debug view. Start with the consolidated debugging tools to narrow down where the suite failed, then use the workflow detail view to confirm the exact step and evidence.

If the run includes a completed Triage tab, review it first. Start with the highlighted cluster, read the summary guidance, and open the workflows grouped in that cluster before you inspect unrelated failures.

Use the full-screen debug view first when you investigate suite failures:

Review Overview to confirm the overall failure signal and likely cause
Check Steps to find slow, blocked, or failed parts of the run
Open Navigation history to trace redirects, page transitions, and in-app route changes leading up to the failure
Check Cache behavior if the result looks different from earlier runs
Open Playwright traces or Agent thoughts when you need deeper investigation beyond the run viewer

Then open the failed workflow and work through the issue in this order:

Read the summary to confirm whether the workflow truly failed or only contains warning-level assertions
Compare what happened with what the step expected
Expand the relevant sections to inspect step data and related logs
Review the navigation timeline details if the workflow appears to fail after a redirect, route change, or unexpected landing page
Open the screenshot and zoom in if you need to confirm visual state at the moment of failure
Check the recording status message if playback is unavailable so you know whether the run was API-only, had no browser interaction, failed before recording started, or could not upload the recording
Use the surrounding step context to decide whether the issue is a product regression, test data problem, or assertion that needs adjustment

When several workflows fail together, use the Triage tab to determine whether they belong to the same cluster before you investigate them one by one. This helps you avoid repeating the same investigation for duplicate symptoms.

If the Triage tab shows Waiting on diagnostics or Running, wait for analysis to finish if you want grouped failure guidance before you continue. If the tab shows Skipped or Failed, continue with the workflow list and debug view instead.

The updated failure signals make fresh regressions easier to spot. Prioritize workflows marked Newly failing before you investigate workflows that are already on a failure streak. This helps you focus first on changes that likely introduced a new breakage.

When the same workflow appears more than once in a run, make sure you open the row with the matching scenario label or execution context before you compare failures. Separate repeated executions before you decide whether you are seeing one regression repeated across runs or different failures on different workflow entries.

When a workflow includes warnings, keep the workflow outcome and the assertion severity separate. A warning tells you something deserves review, while a failure means the run could not satisfy a required check.

If the workflow failed before recording started or the recording upload failed, continue the investigation with screenshots, step logs, API call details, and assertion results. If the workflow was API-only or had no browser interaction, expect the missing-recording state and focus on the non-video evidence in the run details.

The test run detail view gives you better deep-linking around run media, so you can share or reopen a specific part of a run and return to the same debugging context.

11Opening test run details

Open a test run or test execution link in the organization that owns it. If you open a shared link while you are in a different organization, Canary now prompts you to switch to the correct organization instead of leaving you on an inaccessible page.

If you have access to the target organization, confirm the switch to continue directly to the requested test run or execution details. If you do not have access, ask a workspace admin to share the run from an organization you can access or invite you to the correct organization.

12Access and organization context

Test run detail pages always open in the organization that created the run. This matters if you belong to multiple organizations or if a teammate sends you a direct link from another workspace.

Use these expectations when you work across organizations:

Open shared test run links as usual. If the link belongs to another organization you can access, switch when prompted.
Expect the same behavior for test execution detail pages opened from a test run or from a shared direct link.
If you decline the switch, stay in your current organization and reopen the link later when you are ready to change context.
If Canary does not offer a switch prompt, verify that you are signed in with the account that has access to the target organization.

13CI/CD Integration

Test runs can be triggered from your CI/CD pipeline:

bash

# Start a test run via API
curl -X POST https://api.trycanary.ai/workflows/test-runs \
  -H "Authorization: Bearer $CANARY_API_KEY"

See the Running your tests for detailed setup instructions.

14Best Practices

Promote workflows deliberately - Only promoted workflows are included in test runs; approve first to record human sign-off
Use pause intentionally - Pause unstable workflows when you need to remove them from regular regression coverage without deleting them
Use descriptive names - Makes it easier to identify which test failed
Add Wait nodes carefully - Workflows with Wait nodes take longer to complete
Monitor flaky tests - Workflows that frequently show "Flaky Success" may need adjustment
Run before deploys - Use test runs as a quality gate before production deployments
Watch assertion results during active runs - Open workflow details while the run is still Running to catch failing or warning checks early
Use assertion results to troubleshoot faster - Check the assertion summary, step log, and replay view before digging into screenshots, video, or raw logs
Prioritize newly failing workflows first - Use the Newly failing indicator to focus on fresh regressions introduced since the last run
Track recurring failures with streaks - Use failure streak counts to identify workflows that need deeper investigation, cleanup, or ownership
Review paused coverage regularly - Check the paused count in run summaries so important workflows do not stay out of regression coverage longer than intended
Review degraded coverage separately - Check the degraded count in run summaries so you know which workflows were not run and did not affect the pass rate
Use the Triage tab for grouped failures - Start with the highlighted cluster and summary guidance when multiple workflows fail in the same suite
Include the right scenarios in suites - Mark only the scenarios you want to run as part of suite coverage so test runs expand across the variations that matter
Read scenario labels before triaging - Confirm which scenario failed before you treat a result as a workflow-wide regression
Rerun completed suites from the detail page - Use Rerun suite when you want to verify a fix or confirm whether a failure is reproducible without rebuilding the run
Use status cells to jump into workflow runs - Open workflow-level details directly from the suite results table when you need deeper debugging or a shareable run link

15Troubleshooting

Problem	What to do
You opened a test run or test execution link and cannot view the page	Look for the organization switch prompt and confirm it to open the run in the correct organization
You opened a shared link but no switch prompt appears	Verify that you are signed in with an account that has access to the target organization
You do not have access after switching organizations	Ask an admin to invite you to the organization that owns the run or share the results another way
A teammate shared a direct link to a specific execution	Open the link directly. If it belongs to another organization you can access, switch organizations when prompted
Login-dependent workflows seem to pause before steps begin	Expect Canary to warm up saved credentials before workflow execution starts. Wait for the run to continue, then open a workflow to confirm it begins in an authenticated state
A workflow that requires sign-in starts unauthenticated	Confirm that the workflow uses saved credentials available to the workspace, then start the test run again
Authentication issues only appear at the beginning of a suite run	Review the first workflow steps to confirm whether the session was ready when execution began. If the workflow still lands on a sign-in page, re-save the credentials used by that workflow and rerun the suite
Expected workflows did not run in the selected environment	Review the run summary for Skipped workflows, then confirm whether the selected environment matches each workflow's environment eligibility
A workflow shows as Skipped in the results	Open the run summary and check whether the selected environment was eligible for that workflow. Start a new run in a matching environment if you need that workflow to execute
You selected an environment but fewer workflows ran than expected	Check whether some workflows are limited to specific environment types, paused, or marked as degraded. Choose an eligible environment, review the excluded sections in the summary, then rerun the test suite if needed
A workflow does not appear in regression coverage	Check whether it is paused or degraded. Promote the paused workflow or re-enable the degraded workflow if you want it included in future test runs
The pass rate looks higher or lower than expected	Confirm how many workflows were skipped, paused, or degraded. Paused and degraded workflows are excluded from pass-rate calculations and shown separately in the run summary
A workflow appears multiple times in the same test run	Check the scenario label for each execution. Canary runs a separate execution for every included scenario in the suite
A failure only appears for one variation of a workflow	Open the scenario-specific execution and confirm which scenario name is attached to the failure before you triage it
An expected scenario did not run in the suite	Confirm that the scenario is marked to run in suites. Only included scenarios fan out into separate executions
The Triage tab shows Waiting on diagnostics or Running	Wait for Canary to finish collecting diagnostics and analyzing failures, then refresh or reopen the run to review grouped clusters
The Triage tab shows Skipped or Failed	Continue with the workflow list, full-screen debug view, and individual workflow details to investigate failures manually
A workflow has no recording in the detail view	Read the missing-recording message to see whether the run was API-only, had no browser interaction, failed before recording started, or could not upload the recording
A workflow failed and no playback is available	Open the first failed step, screenshots, logs, API call details, and assertion results. If the message shows an upload problem, rerun the workflow if you need a recording
You want to rerun the same completed suite	Open the completed test run and click Rerun suite to start the suite again from the existing result
You need the underlying workflow execution from a suite result	Click the status cell in the completed run details table to open the related workflow run details

Runs & results
Workflow runs — per-run mechanics and failure investigation
Test suites — define a reusable, layered collection to run
Triage & diagnostics
Issues
Running your tests