Docs/Ad-hoc Tests
#

Ad-hoc Tests

export const meta = { title: 'Ad-hoc Tests', description: 'Run one-off AI-driven tests with custom instructions, monitor live progress, and review summaries, issue counts, and recordings from a dedicated Ad-hoc Tests area.', tags: ['reference'], };

Ad-hoc Tests let you start one-off AI-driven test runs against a property without creating a reusable flow first. Use them when you want to quickly explore a scenario, validate a change, or ask the agent to investigate a specific area of your site or app.

01When to use ad-hoc tests

Use ad-hoc tests when you need fast coverage for a focused question or workflow. They work well for exploratory checks, quick regression validation, and targeted investigations that do not need a permanent flow.

Common use cases include:

  • Verifying a recent UI or content change on a single page or path
  • Asking the agent to exercise a specific checkout, sign-in, or form submission scenario
  • Exploring a new feature before you decide whether to model it as a reusable flow
  • Reproducing a suspected issue with custom instructions and a known starting URL
  • Running one-off tests that you want to keep separate from your other run types

If you need repeatable, versioned automation, use a flow instead. For broader release validation, use Release QA plans and runs where available.

02How ad-hoc tests work

An ad-hoc test starts from a property and a starting URL that you choose. You give the AI agent instructions for what to test, then Canary creates a dedicated run in the Ad-hoc Tests area under Runs.

During the run, you can monitor live status and progress from the detail page. After the run finishes, Canary shows a written summary, issue counts, and a video recording so you can quickly review what happened.

Properties and starting URLs

Start an ad-hoc test from the property you want to test. The property determines the environment, site, or app context that the agent runs against.

Use the starting URL to guide the agent to the right entry point for the scenario. Choose a URL that matches the workflow you want to validate, such as a landing page, product page, sign-in page, or a deep link into a specific feature.

A clear starting URL helps the agent begin in the right place and reduces unnecessary navigation. If your scenario depends on a specific page state, start as close to that state as possible.

Optional credentials

Add credentials when the agent needs to access authenticated areas or complete protected workflows. Use this for scenarios such as sign-in, account settings, internal tools, or gated test environments.

Provide only the credentials required for the test. Keep access scoped to the lowest level needed for the scenario so the agent can complete the run without introducing unnecessary risk.

If a workflow can be tested without signing in, omit credentials and start from a public page instead. This keeps the run simpler and easier to review.

Custom instructions for the agent

Write instructions that tell the agent what to test and what success looks like. Keep them specific, task-oriented, and tied to observable outcomes.

Good instructions usually include:

  • The workflow or area to test
  • Any important constraints or priorities
  • Expected outcomes the agent should verify
  • Known risk areas or edge cases worth checking

For example, ask the agent to complete a sign-in flow, add an item to the cart, and confirm the cart updates correctly. Avoid vague prompts that do not define the goal or expected result.

03Running an ad-hoc test

Ad-hoc tests have their own area in Runs, which keeps one-off AI-driven testing separate from other run types. From there, you can start a new test, monitor it while it runs, and open completed runs to review results.

Starting a new ad-hoc test

Note: Canary now gives clearer guidance about which environment it will use before you start the run. If your property has a default sandbox, Canary preselects it and labels the setup clearly so you know whether the test will launch in Live or Sandbox.

  1. Open your property in Canary.
  2. Go to Runs.
  3. Open Ad-hoc Tests.
  4. Click Start ad-hoc test.
  5. Select the property.
  6. Review the environment shown in the setup dialog:
    • Live starts from the property's live URL.
    • Sandbox starts from a supported sandbox environment.
  7. If your property has a default sandbox, confirm whether you want to keep that sandbox or switch to a different environment for this run.
  8. Enter the starting URL.
  9. Add credentials if the workflow requires them.
  10. Add upload fixtures if the workflow needs files:
    • Upload a new file directly in the start dialog, or
    • Open the Attachments library picker to search for and stage an existing organization file
  11. Enter clear instructions for the agent.
  12. If the instruction field shows a validation message, shorten the request before you continue.
  13. If you start the run from Slack, include the Jira ticket you want Canary to update after the run finishes.
  14. Click Start test.

After you start the run, Canary creates a new entry in the Ad-hoc Tests list.

Ad-hoc Tests runs list

The start dialog shows which environment Canary will use for the run and makes default sandbox selection easier to confirm before launch.

Default sandbox guidance in the ad-hoc test start dialog

The start dialog lets you choose whether to target the live property URL or a sandbox, select or override the sandbox for this run, enter a starting URL, add optional credentials, attach upload fixtures, and describe what the agent should test.

Ad-hoc Tests environment selection showing Live and Sandbox options

Use the Attachments library when you want to reuse a file that already exists in your organization's shared library. Search for the file, stage it in the dialog, then launch the run so the agent can use it as an upload fixture during the test.

Attachments library picker for ad-hoc upload fixtures

Use Live when you want to validate the production experience. Use Sandbox when you want to test a non-production environment directly without changing the property URL.

When you switch between Live and Sandbox, Canary updates the available setup fields to match the environment you selected. Use this to confirm that the run is using the right target before you enter credentials or start the test.

Use the following guidance when you choose a target environment:

TargetWhat happens
LiveCanary starts the run directly against the property's live URL. Canary shows only setup options that apply to the live environment.
SandboxCanary starts the run against the selected sandbox. If your property has a default sandbox, Canary preselects it so you can launch faster or switch to another sandbox before starting the run.

If your property has a default sandbox, Canary preselects it for one-click launches. You can still override that sandbox in the start dialog before you begin the run.

If you start an ad-hoc test from Slack, Canary uses the same launch behavior. When the property has a default sandbox, Slack-triggered ad-hoc tests can use that sandbox automatically. Open the run in Canary to monitor progress, review evidence, and confirm whether Canary writes verified results back to the linked Jira ticket after completion.

You can also trigger ad-hoc PR testing from a GitHub pull request comment by mentioning Canary on the PR. Use this when you want to launch a quick test directly from the code review conversation instead of opening Canary first.

When you start a PR-triggered run from GitHub, Canary posts clearer progress updates back into the same pull request conversation and follows up with the final result and a direct report link. Use the PR thread to track the run while reviewers stay in context, then open Canary for the full report, recordings, and artifacts.

If you need setup steps, comment formats, and a full walkthrough for this workflow, use the Trigger ad-hoc PR tests from GitHub comments guide.

GitHub pull request comment trigger for ad-hoc testing

Monitoring live status and progress

Note: The planning state is currently behind the adhoc-test.planning-phase feature flag and may not appear in every workspace yet.

Open a run from the Ad-hoc Tests list to view its detail page. The refreshed layout keeps the main run content in the center and uses separate sidebar tabs so you can switch between the test plan, report details, and issues without leaving the run.

While the test is running, use the live browser area to watch the current session, confirm the agent is on the right page, and decide whether to let the run continue. Check the status and progress information on the detail page as the agent moves through the scenario.

If you started the run from a GitHub pull request comment, also watch the pull request conversation for progress updates. Canary writes run status back to the PR so reviewers can follow execution without opening Canary immediately.

When the planning state is available for your run, Canary shows it before live screenshots begin. Use this step to confirm that setup is in progress and review sandbox context such as the target URL and any seeded login that Canary will use during execution.

Ad-hoc test planning state showing elapsed time and sandbox context

When you launch against a supported sandbox, expect the run to begin with a planning phase before browser actions start. Review the status on the detail page to see when Canary is planning and when it moves into active execution.

If planning does not succeed cleanly on the first try, keep the run open and watch for updated planning status instead of assuming the run is stuck. Canary now tries to recover from incomplete or unusable planning output and surfaces that recovery in the run details.

Use the detail view to monitor the run in real time:

  • Watch the live browser session as the agent moves through the workflow
  • Follow status updates to understand whether the run is planning, executing, or finishing
  • During planning, check the elapsed planning time and sandbox context before the browser stream starts
  • If planning needs recovery, look for planning-related status updates before execution begins or before the run finishes with a surfaced outcome
  • Use the Plan, Report, and Issues tabs in the sidebar to move between setup context, results, and findings as the run progresses
  • Confirm that the selected Live or Sandbox target matches your intent for this run
  • If the run started from GitHub, use the PR thread for quick progress checks and open Canary when you need the live browser view or full detail page

When you review an in-progress run, confirm that the target environment matches your intent. This is especially helpful when you launch against a sandbox or override the property's default sandbox for a single test.

Ad-hoc test detail page showing Plan, Report, and Issues sidebar tabs

Canceling a running test

Cancel a run if you notice that the agent is testing the wrong area, using the wrong setup, or no longer needs to continue. This is useful when you want to stop an exploratory run early and start over with better instructions or a different starting URL.

Open the running test, then use the available cancel action on the detail page. After you cancel the run, Canary stops further progress and preserves the run record so you can still review what happened before cancellation.

04Reviewing results

When an ad-hoc test finishes, open the run detail page to review the outcome. Start in the Plan tab to confirm what the agent intended to cover, switch to the Report tab to review grouped results and the written summary, then open Issues to inspect verified findings and follow-up actions.

The redesigned sidebar makes post-run review easier because planning context, report content, and issues no longer share one combined panel. Move between tabs as you review so you can compare intended coverage, final results, and issue evidence without losing your place in the recording.

Structured ad-hoc test report with grouped results

If you started the run from a GitHub pull request comment, use the pull request conversation as a quick status and handoff view. Canary writes progress updates during execution and posts the final result back to the PR with a direct link to the Canary report so reviewers can jump from the thread into the full run details.

GitHub writebacks can also include richer artifacts such as screenshots and recordings when they are available. Use the PR comment thread for a fast review summary, then open the run in Canary when you need the structured report, issue details, and full artifact set.

For a step-by-step walkthrough of GitHub comment triggers, setup requirements, and example usage, see Trigger ad-hoc PR tests from GitHub comments.

Summary and issue counts

Use the written summary for a quick explanation of the run outcome and the most important findings. Review it after you scan the structured test plan so you can connect the high-level explanation to the specific passed, failed, blocked, and skipped checks.

Open the Report tab to review summary details and issue counts, then switch to Issues when you need to inspect individual findings or take follow-up action. This keeps each review step focused while preserving access to the recording and run timeline.

If your workspace has the planning state enabled, review the run timeline from the beginning so you can distinguish setup time from active browser execution. This helps you understand whether Canary spent time preparing sandbox context before screenshots and result activity began.

Outcome summaries now make it easier to understand how results are distributed across the run. Use them to separate critical failures in high-priority coverage from lower-impact issues in exploratory or edge-case coverage.

If planning only partially succeeds, Canary surfaces that outcome in the run summary instead of ending without explanation. Use the summary and issue counts together to see whether Canary continued with partial coverage, marked checks as blocked, or stopped before execution could begin.

Review coverage by priority to understand where the run spent time and which findings matter most:

Coverage levelHow to use it
High priorityFocus here first for core user journeys and release-critical paths.
Lower priorityUse this to review exploratory coverage, secondary flows, and edge cases after you confirm core results.

When you compare results across runs, note whether the test targeted Live or a Sandbox. This helps you interpret findings correctly, especially when a sandbox contains unreleased changes or test-only data.

If setup could not use the credential template you expected for the selected environment, review the run-time message in the summary or status details. Canary now explains more clearly when a credential template is unavailable for the current Live or Sandbox target so you can switch environments, choose a different setup option, or update the template before rerunning.

If the planner cannot recover enough to produce usable coverage, Canary surfaces that outcome on the run detail page instead of silently failing. Review the final status, summary, and any blocked coverage areas to decide whether to rerun with narrower instructions, a different starting URL, or updated sandbox setup.

If you started the ad-hoc test from Slack with a linked Jira ticket, Canary can write a richer result summary back to that ticket when the run finishes. The Jira update can include the final verdict, verified issues, and direct links back to the Canary run and issue replays so teammates can review the outcome without leaving the ticket.

Use the Jira writeback as a review shortcut, not a replacement for the full run detail page. Open the run in Canary when you need the complete structured report, grouped checks, and full recording context.

Video recordings

Each run includes session evidence directly on the detail page. After the run finishes, use the recording player to replay the session, scrub through the timeline, and inspect the exact point where a failed, blocked, or skipped item occurred.

The detail page now keeps playback and the run timeline synchronized. Scrub the video or select a timeline moment to jump to the matching point in the session, then switch between the Plan, Report, and Issues tabs as you compare what happened on screen with the run context and findings.

If the run included a planning state, the timeline starts before the first live screenshot. Use that early portion of the timeline to see when planning finished and when browser playback began.

Use the recording view to review results efficiently:

  • Jump to specific moments in the run from the playback timeline
  • Compare what happened on screen with the grouped report sections and summary
  • Switch between the Plan, Report, and Issues tabs to keep the right context visible during playback
  • Use the synchronized timeline to line up recording playback with result events and issue activity
  • Check where planning ended and live browser execution began when that state is available
  • Open related issues from the Issues tab when you need to investigate a finding
  • Validate the exact sequence that led to a failure, blocked state, skip, or unexpected behavior

If Canary shares results back to Jira, reviewers can open the run and per-issue replay links directly from the ticket. Use those links to move from the Jira summary into the exact Canary evidence for each verified issue.

Review artifacts are also more reliable across browser context changes and user switches. If a run spans multiple app states, expect the recording and timeline to stay intact more consistently so you can follow the full sequence without missing evidence.

Ad-hoc test detail page with synchronized video timeline and issues side panel

Failure states

An ad-hoc test can end without completing successfully for several reasons, including application issues, blocked access, invalid setup, planning that could not recover to a usable test plan, or a canceled run. Review the final status, summary, and recording to determine where the run stopped and what to adjust next.

The final verdict on the detail page makes pass, fail, and blocked outcomes easier to distinguish. Use that verdict first, then review the affected coverage area, summary, and evidence to decide what to do next.

Use the following outcome guide when you review completed runs:

OutcomeWhat it means
PassedCanary completed the covered checks without finding a blocking problem in that area.
FailedCanary found a problem that caused one or more checks to fail. Review the related issue and replay evidence.
BlockedCanary could not complete part of the intended coverage because access, setup, environment state, planning recovery limits, or another dependency prevented progress.
CanceledThe run stopped because you or another user canceled it before completion.

When a run fails:

  • Confirm that the starting URL points to the correct page
  • Verify that any credentials are valid and have the right access
  • Rewrite instructions to be more specific and goal-oriented
  • Rerun the test with a narrower scope if the original request was too broad
  • Use the recording and issue side panel to identify the exact step where progress broke down
  • If the run stopped during planning, review the surfaced planning outcome and adjust the sandbox setup or instructions before rerunning

Ad-hoc test detail page with blocked status and priority-based outcome summary

05Best practices

  • Start from the closest possible URL to the workflow you want to test.
  • Keep instructions short, specific, and focused on observable outcomes.
  • Provide credentials only when the scenario truly requires authenticated access.
  • Use ad-hoc tests for one-off exploration, then convert stable, repeated checks into flows.
  • Review the summary first, then validate findings with issue counts and the recording.
  • Cancel early if the agent starts in the wrong place or follows the wrong intent.

06Sharing and follow-up

If you start an ad-hoc test from Slack and link it to a Jira ticket, Canary can post richer verified results back to Jira after the run completes. This helps you keep the ticket updated with the verdict, confirmed issues, and evidence links without copying results manually.

Canary can share the following back to Jira for Slack-triggered runs:

ItemWhat to expect
Verdict summaryJira receives a concise pass, fail, or mixed-outcome summary for the completed run.
Verified issuesJira includes the confirmed issues that Canary verified during the run.
Run linkJira links back to the full ad-hoc test run in Canary for deeper review.
Per-issue replaysJira can include direct replay links for individual verified issues so reviewers can jump to the right evidence faster.

If you trigger an ad-hoc test from a GitHub pull request comment, use the PR writeback to share results with reviewers. Canary now posts more durable result links back to the pull request, including permanent recording links that do not expire.

Use the PR comment for quick sharing when you want reviewers to open the final report or replay the recording directly from the code review conversation. Open Canary when you need the full run detail page, sidebar tabs, and all available evidence in one place.

Use the run detail page in Canary as the source of truth for the full report, grouped checks, and recording. Use the Jira writeback to keep the related ticket updated for teammates who follow work from Jira or Slack.

This workflow is especially useful when you coordinate investigations in Slack but track final decisions in Jira. Launch the test from Slack, link the Jira ticket, then review the verdict and verified issues from the ticket before opening Canary for deeper analysis.

If you do not start the run from Slack with a linked Jira ticket, Canary still stores the full result in Ad-hoc Tests, but it does not write the run outcome back to Jira automatically.

Jira writeback for ad-hoc test results