Test Runs
export const meta = { title: 'Test Runs', description: 'Run all your published workflows together and review pass/fail results.', tags: ['reference', 'runs', 'testing'], };
Test runs execute all your published workflows in a single batch, capturing pass/fail results for each workflow. They serve as regression tests to ensure your application continues working as expected after changes.
01Starting a Test Run
- Go to Runs in the sidebar
- Click New Run → Test Run
- Select the environment where you want to run the suite
- Review the published workflows that are eligible for that environment
- Start the run
Note: The "Test Run" option is only available if you have at least one published workflow.
A workflow can run only when the selected environment matches that workflow's environment eligibility. If a workflow is limited to specific environment types, Canary shows that constraint before you start the run.
When you choose an environment, Canary compares that environment with each published workflow's constraints. Eligible workflows run normally. Ineligible workflows stay out of execution and appear as skipped in the run summary so you can confirm why they did not run.
Quarantined workflows are also excluded from test runs. They stay out of regression coverage for the suite and appear separately from runnable workflows so you can tell they were intentionally left out rather than skipped by environment selection.

02What Happens During a Test Run
When you start a test run:
- Discovery - Canary finds all workflows with status "published"
- Eligibility check - Canary compares the selected environment with each workflow's environment constraints and marks ineligible workflows to skip
- Authentication warm-up - Canary warms up any saved credentials before workflows begin so authenticated sessions are ready at the start of execution
- Execution - Each eligible workflow runs concurrently in its own browser session
- Verification - Canary checks expected outcomes and records assertion results, including pass, warning, and failure states
- Reporting - Canary records run status for each workflow, including skipped workflows, and saves step-by-step debugging details, API call context, and run media for review
If your published workflows rely on saved credentials, expect the test run to prepare those sessions before the first workflow step starts. This reduces login friction at the beginning of the suite and helps login-dependent workflows begin in an authenticated state.
Test runs use the same execution engine as individual workflow runs, including automatic retries for flaky failures. In the run details view, you can inspect assertion summaries, review per-step assertion outcomes, and drill into richer debugging context for each workflow.
03Test Run Statuses
| Status | Meaning |
|---|---|
| Queued | Test run is waiting to start |
| Running | Test run is warming up saved credentials or workflows are executing |
| Completed | All workflows finished without any true failures |
| Completed (Errors) | One or more workflows ended with a true failure |
04Workflow Outcomes
Degraded workflows are intentionally excluded from execution until you re-enable them. In test run results, Canary shows them separately as Degraded or Not run instead of counting them as passed, failed, flaky, or skipped.
Each workflow in a test run has its own outcome:
| Outcome | Meaning |
|---|---|
| Success | Workflow completed without any failing steps or failing assertions |
| Failed | Workflow ended with a true failure after all retries or hit a failing assertion |
| Flaky Success | Workflow passed after one or more retries |
| Waiting | Workflow is paused at a Wait node |
| Skipped | Workflow did not run because the selected environment was not eligible for that workflow |
| Degraded | Workflow is marked unreliable and stays out of suite execution until you re-enable it |
| Not run | Workflow did not execute because it is currently degraded |
Warning-level assertions do not change a workflow to Failed on their own. Treat them as results that need review, not as proof that execution broke.
When you open a workflow from a test run, the detail view shows per-step assertion results so you can quickly see what passed, warned, or failed. Because test runs use the same result viewer as workflow runs, you can interpret these outcomes the same way you do in workflow run details.
See Workflow Run Details to understand how pass, warning, and failure results appear in the viewer and how to inspect the related step context.
While the test run is still Running, assertion outcomes stream into the workflow detail view as checks complete. Step logs now make assertion states easier to scan, so you can spot passing, warning, and failing checks without leaving the active run.
After the run finishes, the same workflow detail view preserves those assertion states in both the step log and replay view. This helps you confirm whether a workflow failed because of a blocking assertion or completed with warning-level results that still need review.
Quarantined workflows do not receive a workflow outcome in the run because they are not executed as part of the suite. Treat them as intentionally excluded from regression coverage until you remove the quarantine.
Degraded workflows also stay out of suite execution, but they remain visible in the run summary so you can see that Canary excluded them because of reliability. Treat a degraded result as a coverage signal, not as a passing or failing outcome.
The workflow list in test run details also highlights failure patterns across runs. Use these indicators to decide whether you are looking at a new breakage or an ongoing regression:
| Indicator | What it tells you | How to use it |
|---|---|---|
| Newly failing | The workflow failed in this test run after passing in the previous run | Prioritize this first when you want to find fresh regressions introduced by recent changes |
| Failure streak | The workflow has failed in consecutive test runs | Use the streak count to spot recurring problems that need deeper investigation or ownership |
Use Newly failing to separate fresh breakages from known unstable areas. Use the failure streak to judge whether a workflow has been failing repeatedly and may need a broader fix, test cleanup, or triage follow-up.
05Auto-Verification and Issue Filing
When a workflow fails during a test run:
- Canary analyzes the failure to determine if it is a real bug or test instability
- If it is a real bug, Canary automatically creates an issue with:
- Screenshot at the moment of failure
- Steps that led to the failure
- Error message and debugging context from the run details
- Assertion results that explain what failed or warned
- Issues are deduplicated, so the same failure will not create multiple issues
Use the workflow detail view in the test run to confirm whether the failure came from a step action, an assertion, or an API call before you review the filed issue.
06Viewing Test Run Results
From the Runs page:
- Click the Test Runs tab
- Click any test run row to see details
- Review per-workflow outcomes, assertion summaries, skipped workflows, quarantined workflows, and status indicators
- Click a workflow to inspect step details, assertion results, API calls, screenshots, and video
Use the workflow list first to separate runs that need attention from runs that actually failed. A workflow can show warning-level assertions and still finish with a successful outcome.

When the selected environment does not match a workflow's constraints, the run summary shows that workflow as Skipped instead of failed. Review skipped rows to confirm which workflows were excluded because the selected environment was not eligible.
When a workflow includes scenarios, the test run expands that workflow across every included scenario instead of running only a single variation. Review the workflow row and detail view together so you can see which scenario-specific executions were generated for the suite.
Suites only fan out across scenarios that are marked to run in suites. If a workflow has a default scenario plus additional included scenarios, Canary creates a separate execution for each included scenario and shows the scenario name with the result.

When a workflow is quarantined, the run summary surfaces it separately from executed and skipped workflows. Review this section to understand which workflows are currently out of regression coverage and why the run may include fewer executed workflows than your total published workflow count.
While a test run is active, open any workflow row to monitor live assertion outcomes as they stream into the run details view. Use the assertion summary to see whether checks are passing, warning, or failing, then open the relevant step to inspect the assertion result in context.
After the test run completes, start at the run-level summary before you drill into an individual workflow. Read the counts for passed, failed, flaky, skipped, degraded, and quarantined workflows separately so you understand both execution quality and coverage before you triage failures.
The suite summary now gives you a richer regression overview. Read the progress state, pass rate, and workflow counts together before you investigate a single row. This helps you tell whether the suite is still in progress, whether coverage changed because of skipped, degraded, or quarantined workflows, and how much of the executed suite passed.
If the same workflow appears more than once in the summary, compare the workflow name, scenario label, and run status together before you decide what failed. Repeated entries can represent separate included scenarios or multiple executions that belong to the same workflow, so always open the specific row you want to inspect.
Use the pass rate as a quick health signal, not as your only triage input. Pair it with the failed count and the Newly failing markers so you can tell whether a lower pass rate comes from a fresh regression, repeated failures, or broader suite coverage changes.
Degraded workflows do not affect the suite pass rate. Canary excludes them from the pass-rate calculation and shows the degraded total explicitly in the summary so you can separate execution health from temporarily removed coverage.

Suite result counts now reflect the actual executions included in the run summary. Use the workflow list and scenario labels together when you verify totals, especially if a workflow fans out across multiple included scenarios.
When scenario fan-out is enabled for a workflow, read the scenario label before you interpret the result. A failure tied to one scenario does not mean every scenario for that workflow failed.
Quarantined workflows are excluded from pass-rate calculations. Use the quarantined count as a coverage signal, not as a passing or failing result.
Completed suite runs can also include a Triage tab. Use this tab to review AI-powered failure grouping before you inspect individual workflows.
In the Triage tab, Canary groups related failures into clusters, highlights the most pressing cluster first, and shows a summary of what likely needs attention. Use these clusters to understand whether several failed workflows point to the same underlying issue.
While triage is being prepared, the tab can show intermediate states so you know what is happening:
| Triage state | What it means |
|---|---|
| Waiting on diagnostics | Canary is collecting the information needed to start triage |
| Running | Canary is actively analyzing failures and building clusters |
| Skipped | Canary did not generate triage for this completed test run |
| Failed | Canary could not complete triage for this completed test run |
If triage completes successfully, start with the highlighted cluster summary, then open the related failed workflows to confirm the root cause and evidence.

After you review the Triage tab, use the full-screen debug view to investigate the suite before you drill into an individual workflow. The same consolidated debugging experience used for workflow runs is now available for published workflow test runs, so you can review the run in one place.
Use the full-screen debug view to answer these questions quickly:
| If you need to know... | Check this in the debug view |
|---|---|
| Which part of the suite slowed down or stalled | Overview and Steps |
| Where the browser moved during execution | Navigation history |
| Whether cached data affected the run | Cache behavior |
| What Canary thinks likely caused the failure | Overview |
| Where to continue a deeper technical investigation | Playwright traces and Agent thoughts |
Open Navigation history when you need to trace redirects, page loads, or in-app route changes across the run. This timeline makes it easier to understand whether a failure happened after the browser landed on the wrong page, bounced through a redirect chain, or never reached the expected destination.

After you identify the likely problem area, open the failed workflow to inspect the related step details, assertions, screenshots, and video. Use the full-screen debug view first for triage, then use the workflow detail view to confirm the exact failure.
After the test run completes, scan the workflow list for the new failure indicators before you drill into an individual workflow. Start with workflows marked Newly failing to catch fresh regressions quickly, then review any failure streak counts to identify workflows that have been failing across multiple runs.
Use the failure streak as a triage shortcut. A long streak usually means the issue is recurring, while a newly failing workflow often points to a recent product or test change.
After the test run completes, use the result chips and row actions to open the exact execution you want to inspect. Click the workflow row to open the suite result, then use the status cell to jump directly into that workflow run's detailed view.
After you open a run detail view, use the summary header, step log, replay, screenshots, and video together to confirm what happened. These links make it easier to move from a suite-level summary into the full execution context without losing your place.
07Review results
Use the run detail page to move from suite-level outcomes into the exact execution you want to inspect.
- Open a completed test run
- Review the workflow rows in the results table
- Click a workflow row to open the suite result for that workflow
- Click the status cell to open the related workflow run details
Status cells on completed run details link directly to the underlying workflow run. Use that link when you want the full run viewer, deeper step inspection, or a shareable execution URL.

After you open the workflow run details, continue your investigation in Workflow Run Details.
08Rerun a suite
When a test run is complete, you can run the same suite again from its detail page.
- Open a completed test run
- Click Rerun suite
- Confirm the environment and start the run
- Wait for Canary to open the new test run automatically
Use this action when you want to verify a fix, confirm whether a failure is reproducible, or rerun the suite without starting from New Run.

09Navigate to workflow run details
From a completed test run, open workflow-level details directly from the results table.
- Click a workflow row to review the suite result in context
- Click the status cell to jump to the individual workflow run details
- Use the workflow run page when you need the full execution viewer, step-by-step debugging, or a direct link to share
This shortcut helps you move from regression results to workflow-level debugging without searching for the same execution elsewhere.
After the test run completes, use the redesigned failure inspection view to review final assertion results alongside step details. The wider layout, expandable sections, and clearer expected-versus-actual context help you move from the workflow outcome to the exact step that needs review.
Use these cues when you review a workflow result:
| Assertion state | What it tells you |
|---|---|
| Pass | The check succeeded at that step |
| Warning | The check found something to review, but the workflow still completed |
| Failure | The check failed and contributed to the workflow failing |
Test run details use the same viewer as workflow run details, so you can use the same interpretation for pass, warning, and failure outcomes when reviewing a test run. For a complete walkthrough of the viewer, see Workflow Run Details.
When a recording is unavailable, the workflow detail view now explains why instead of leaving the replay area empty. Use this message to tell whether the run had no browser playback to show or whether Canary could not finish saving the recording.
Review these missing-recording states when you inspect a workflow result:
| Missing recording state | What it means | What to do |
|---|---|---|
| API-only run | The workflow completed without browser playback because it only used API steps | Review API call details, step logs, and assertions instead of video |
| No browser interaction | The workflow opened a browser-capable run, but no browser action happened that produced a recording | Check the executed steps to confirm whether the run stayed in non-browser logic |
| Failed before recording started | The workflow stopped so early that Canary did not begin capturing playback | Open the first failed step, screenshot, and logs to find the blocking error |
| Recording upload failed | The run executed, but Canary could not finish saving the recording | Use screenshots and step details for troubleshooting, then rerun if you need playback |

Treat the missing-recording message as part of the run evidence. It helps you decide whether the absence of playback is expected for that workflow or whether you should rerun the test to capture media.
When multiple workflows fail in the same run, review them in this order:
- Workflows marked Newly failing
- The highlighted cluster in the Triage tab, if available
- Workflows with the longest failure streaks
- Remaining failed workflows without either indicator
This order helps you catch recent regressions without losing sight of long-running failures that still need attention.
10Failure investigation
Open the failed workflow from the test run after you review the full-screen debug view. Start with the consolidated debugging tools to narrow down where the suite failed, then use the workflow detail view to confirm the exact step and evidence.
If the run includes a completed Triage tab, review it first. Start with the highlighted cluster, read the summary guidance, and open the workflows grouped in that cluster before you inspect unrelated failures.
Use the full-screen debug view first when you investigate suite failures:
- Review Overview to confirm the overall failure signal and likely cause
- Check Steps to find slow, blocked, or failed parts of the run
- Open Navigation history to trace redirects, page transitions, and in-app route changes leading up to the failure
- Check Cache behavior if the result looks different from earlier runs
- Open Playwright traces or Agent thoughts when you need deeper investigation beyond the run viewer
Then open the failed workflow and work through the issue in this order:
- Read the summary to confirm whether the workflow truly failed or only contains warning-level assertions
- Compare what happened with what the step expected
- Expand the relevant sections to inspect step data and related logs
- Review the navigation timeline details if the workflow appears to fail after a redirect, route change, or unexpected landing page
- Open the screenshot and zoom in if you need to confirm visual state at the moment of failure
- Check the recording status message if playback is unavailable so you know whether the run was API-only, had no browser interaction, failed before recording started, or could not upload the recording
- Use the surrounding step context to decide whether the issue is a product regression, test data problem, or assertion that needs adjustment
When several workflows fail together, use the Triage tab to determine whether they belong to the same cluster before you investigate them one by one. This helps you avoid repeating the same investigation for duplicate symptoms.
If the Triage tab shows Waiting on diagnostics or Running, wait for analysis to finish if you want grouped failure guidance before you continue. If the tab shows Skipped or Failed, continue with the workflow list and debug view instead.
The updated failure signals make fresh regressions easier to spot. Prioritize workflows marked Newly failing before you investigate workflows that are already on a failure streak. This helps you focus first on changes that likely introduced a new breakage.
When the same workflow appears more than once in a run, make sure you open the row with the matching scenario label or execution context before you compare failures. Separate repeated executions before you decide whether you are seeing one regression repeated across runs or different failures on different workflow entries.
When a workflow includes warnings, keep the workflow outcome and the assertion severity separate. A warning tells you something deserves review, while a failure means the run could not satisfy a required check.
If the workflow failed before recording started or the recording upload failed, continue the investigation with screenshots, step logs, API call details, and assertion results. If the workflow was API-only or had no browser interaction, expect the missing-recording state and focus on the non-video evidence in the run details.
The test run detail view gives you better deep-linking around run media, so you can share or reopen a specific part of a run and return to the same debugging context.
11Opening test run details
Open a test run or test execution link in the organization that owns it. If you open a shared link while you are in a different organization, Canary now prompts you to switch to the correct organization instead of leaving you on an inaccessible page.
If you have access to the target organization, confirm the switch to continue directly to the requested test run or execution details. If you do not have access, ask a workspace admin to share the run from an organization you can access or invite you to the correct organization.
12Access and organization context
Test run detail pages always open in the organization that created the run. This matters if you belong to multiple organizations or if a teammate sends you a direct link from another workspace.
Use these expectations when you work across organizations:
- Open shared test run links as usual. If the link belongs to another organization you can access, switch when prompted.
- Expect the same behavior for test execution detail pages opened from a test run or from a shared direct link.
- If you decline the switch, stay in your current organization and reopen the link later when you are ready to change context.
- If Canary does not offer a switch prompt, verify that you are signed in with the account that has access to the target organization.
13CI/CD Integration
Test runs can be triggered from your CI/CD pipeline:
# Start a test run via API curl -X POST https://api.trycanary.ai/workflows/test-runs \ -H "Authorization: Bearer $CANARY_API_KEY"
See the CI/CD Integration guide for detailed setup instructions.
14Best Practices
- Keep workflows published - Only published workflows are included in test runs
- Use quarantine intentionally - Mark unstable workflows as quarantined when you need to remove them from regular regression coverage without deleting them
- Use descriptive names - Makes it easier to identify which test failed
- Add Wait nodes carefully - Workflows with Wait nodes take longer to complete
- Monitor flaky tests - Workflows that frequently show "Flaky Success" may need adjustment
- Run before deploys - Use test runs as a quality gate before production deployments
- Watch assertion results during active runs - Open workflow details while the run is still Running to catch failing or warning checks early
- Use assertion results to troubleshoot faster - Check the assertion summary, step log, and replay view before digging into screenshots, video, or raw logs
- Prioritize newly failing workflows first - Use the Newly failing indicator to focus on fresh regressions introduced since the last run
- Track recurring failures with streaks - Use failure streak counts to identify workflows that need deeper investigation, cleanup, or ownership
- Review quarantined coverage regularly - Check the quarantined count in run summaries so important workflows do not stay out of regression coverage longer than intended
- Review degraded coverage separately - Check the degraded count in run summaries so you know which workflows were not run and did not affect the pass rate
- Use the Triage tab for grouped failures - Start with the highlighted cluster and summary guidance when multiple workflows fail in the same suite
- Include the right scenarios in suites - Mark only the scenarios you want to run as part of suite coverage so test runs expand across the variations that matter
- Read scenario labels before triaging - Confirm which scenario failed before you treat a result as a workflow-wide regression
- Rerun completed suites from the detail page - Use Rerun suite when you want to verify a fix or confirm whether a failure is reproducible without rebuilding the run
- Use status cells to jump into workflow runs - Open workflow-level details directly from the suite results table when you need deeper debugging or a shareable run link
15Troubleshooting
| Problem | What to do |
|---|---|
| You opened a test run or test execution link and cannot view the page | Look for the organization switch prompt and confirm it to open the run in the correct organization |
| You opened a shared link but no switch prompt appears | Verify that you are signed in with an account that has access to the target organization |
| You do not have access after switching organizations | Ask an admin to invite you to the organization that owns the run or share the results another way |
| A teammate shared a direct link to a specific execution | Open the link directly. If it belongs to another organization you can access, switch organizations when prompted |
| Login-dependent workflows seem to pause before steps begin | Expect Canary to warm up saved credentials before workflow execution starts. Wait for the run to continue, then open a workflow to confirm it begins in an authenticated state |
| A workflow that requires sign-in starts unauthenticated | Confirm that the workflow uses saved credentials available to the workspace, then start the test run again |
| Authentication issues only appear at the beginning of a suite run | Review the first workflow steps to confirm whether the session was ready when execution began. If the workflow still lands on a sign-in page, re-save the credentials used by that workflow and rerun the suite |
| Expected workflows did not run in the selected environment | Review the run summary for Skipped workflows, then confirm whether the selected environment matches each workflow's environment eligibility |
| A workflow shows as Skipped in the results | Open the run summary and check whether the selected environment was eligible for that workflow. Start a new run in a matching environment if you need that workflow to execute |
| You selected an environment but fewer workflows ran than expected | Check whether some workflows are limited to specific environment types, marked as quarantined, or marked as degraded. Choose an eligible environment, review the excluded sections in the summary, then rerun the test suite if needed |
| A workflow does not appear in regression coverage | Check whether it is marked as quarantined or degraded. Remove the quarantine or re-enable the degraded workflow if you want it included in future test runs |
| The pass rate looks higher or lower than expected | Confirm how many workflows were skipped, quarantined, or degraded. Quarantined and degraded workflows are excluded from pass-rate calculations and shown separately in the run summary |
| A workflow appears multiple times in the same test run | Check the scenario label for each execution. Canary runs a separate execution for every included scenario in the suite |
| A failure only appears for one variation of a workflow | Open the scenario-specific execution and confirm which scenario name is attached to the failure before you triage it |
| An expected scenario did not run in the suite | Confirm that the scenario is marked to run in suites. Only included scenarios fan out into separate executions |
| The Triage tab shows Waiting on diagnostics or Running | Wait for Canary to finish collecting diagnostics and analyzing failures, then refresh or reopen the run to review grouped clusters |
| The Triage tab shows Skipped or Failed | Continue with the workflow list, full-screen debug view, and individual workflow details to investigate failures manually |
| A workflow has no recording in the detail view | Read the missing-recording message to see whether the run was API-only, had no browser interaction, failed before recording started, or could not upload the recording |
| A workflow failed and no playback is available | Open the first failed step, screenshots, logs, API call details, and assertion results. If the message shows an upload problem, rerun the workflow if you need a recording |
| You want to rerun the same completed suite | Open the completed test run and click Rerun suite to start the suite again from the existing result |
| You need the underlying workflow execution from a suite result | Click the status cell in the completed run details table to open the related workflow run details |