Running tests with the CLI
export const meta = { title: 'Running tests with the CLI', description: 'Run Canary workflows from your terminal, locally or in CI/CD, with real-time results.', tags: ['guide', 'cli', 'testing', 'ci-cd'], };
The Canary CLI lets you trigger test runs from your terminal and stream results in real-time. Use it locally during development or wire it into CI/CD to gate deployments.
You can also use the CLI for sandbox authoring and management workflows, including building templates, uploading artifacts, and managing template actions. See the Sandbox reference for sandbox-specific commands and options.
01Prerequisites
- At least one published flow in your organization.
- An API key or an interactive login session.
- If you start from a sandbox, access to that sandbox so you can open the Connect locally entry point and launch the CLI from the sandbox UI.
If you haven't built flows yet, start with Building your first smoke suite.
02Install the CLI
Install the Canary CLI globally with your preferred package manager:
# npm npm install -g @canaryai/cli # or with bun bun add -g @canaryai/cli
Use the same CLI for both test execution and sandbox workflows from your terminal, including service-level sandbox debugging commands.
Verify the install:
canary version
03Authenticate
You have two options: interactive login (for local development) or an API key (for CI/CD and automation).
The same authentication methods work for test commands and sandbox management commands.
If you are working in a sandbox, open the sandbox and use Connect locally to open the connect drawer. The drawer points you to the local CLI flow so you can authenticate, connect your local machine, and continue with richer agent workflows from your terminal.
The CLI also includes built-in guidance for Claude and Cowork MCP usage. After you install and authenticate, you may see CLI guidance that points you to AI-assisted workflows for exploring your app, building workflows, running tests, and investigating failures.
Option A: Interactive login
canary login
This opens your browser for a one-time device code flow. Once approved, the CLI stores a long-lived token at ~/.config/canary-cli/auth.json.
If your account belongs to multiple organizations, you'll be prompted to pick one. You can also pass --org <name> to skip the prompt.
A recent CLI fix also improves organization selection during sign-in when your account has memberships tied to deleted organizations.
Option B: API key
Create an API key in Settings > API Keys (requires admin access), then pass it to the CLI:
canary test --remote --token cnry_your_api_key
Or set it as an environment variable so you don't have to pass it every time:
export CANARY_API_TOKEN=cnry_your_api_key
See API Keys for details on creating and managing keys.
04Run your tests
Trigger a test run across all published workflows:
canary test --remote
The CLI will:
- Start a remote test run
- Stream real-time results to your terminal
- Print pass/fail status for each workflow as it finishes
- Exit with code
0if all workflows passed, or1if any failed
Example output:
Starting remote workflow tests...
✓ Sign in flow
✓ Create project
✗ Checkout flow
Error: Expected "Order confirmed" but page showed "500 Internal Server"
✓ Search and filter
──────────────────────────────────────────────────
FAILED: 1 of 4 workflows failed (75% pass rate)
Saved credentials now warm up before workflows begin during test suite runs. This helps authenticated sessions start in a ready-to-use state and reduces login friction at the start of automated runs.
AI-assisted guidance in the CLI
The CLI now surfaces built-in guidance for Claude and Cowork MCP usage while you work. Use that guidance when you want help from an AI agent without leaving your terminal workflow.
You may encounter this guidance when you:
- Explore an app before you build flows
- Build new workflows from a sandbox or test environment
- Run tests and decide what to execute next
- Investigate failed runs and gather the right context for follow-up work
Use the guidance to speed up common tasks:
| Task | How the guidance helps |
|---|---|
| App exploration | Points you toward MCP-assisted ways to inspect the app and understand key paths before recording or refining workflows |
| Workflow building | Suggests how to hand off workflow creation or iteration to Claude or Cowork MCP workflows |
| Running tests | Helps you move from a manual CLI run to an AI-assisted loop for selecting, rerunning, or expanding coverage |
| Failure investigation | Helps you gather context from failed runs and continue troubleshooting with an AI agent |
If you also use the CLI for sandboxes, you can run template and environment workflows from the same terminal session. For example, the CLI now supports sandbox action list, get, create, and delete commands for template authoring. See the Sandbox reference for the full command set.
You can also run a command inside a specific sandbox service instead of only the host VM. Use this when you need to inspect a container, run service-specific checks, or debug how your app behaves inside the sandbox.
canary sandbox run-command <instance-id> <command...> --service <name>
For example, run tests inside a web service container:
canary sandbox run-command sbx_123 npm test --service web
Long-running sandbox builds are also more resilient. If a build takes longer than the gateway timeout, the CLI continues by polling diagnostics and reporting progress instead of failing immediately.
When uploading sandbox artifacts, use template-level artifacts for files shared across a template, and use custom file artifacts when you need an explicit destination path inside the sandbox. Refer to the Sandbox reference for current artifact upload options and examples.
05Filter which tests run
You don't always want to run everything. Use tags and name patterns to run a subset.
By tag
Tags are assigned to flows in the UI. Filter by tag to run a specific category:
canary test --remote --tag smoke
By name pattern
Match workflows by name (case-insensitive):
canary test --remote --name-pattern "checkout"
Combine filters
When both are provided, workflows must match both criteria:
canary test --remote --tag smoke --name-pattern "auth"
06Verbose mode
Add --verbose (or -v) to see every SSE event as it streams, including suite metadata and timing:
canary test --remote --tag smoke --verbose
07Exit codes
| Code | Meaning |
|---|---|
0 | All workflows passed |
1 | One or more workflows failed |
This makes the CLI a natural fit for CI/CD pipeline gates: if a workflow fails, the pipeline step fails.
08Run in GitHub Actions
Add this workflow to .github/workflows/canary-tests.yml:
name: Canary Tests
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
canary:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install Canary CLI
run: npm install -g @canaryai/cli
- name: Run smoke tests
env:
CANARY_API_TOKEN: ${{ secrets.CANARY_API_TOKEN }}
run: canary test --remote --tag smoke
Setup steps
- Create an API key in your Canary organization.
- In your GitHub repo, go to Settings > Secrets and variables > Actions.
- Add a secret named
CANARY_API_TOKENwith the key value. - Commit the workflow file above.
Tips
- Use the
--tagflag to run only smoke tests in CI rather than the full suite. Save the full suite for nightly or staging deploys. - Pin to a specific CLI version if you want reproducible builds:
npm install -g @canaryai/cli@0.1.7 - Add the step after your deploy so you're testing the freshly deployed version.
- Use
--verbosein CI for better debugging when a failure occurs.
For more CI platforms (GitLab CI, CircleCI, Jenkins), see the CI/CD Integration guide.
09Environment variables
| Variable | Description | Default |
|---|---|---|
CANARY_API_TOKEN | API key or login token | (none) |
CANARY_API_URL | API endpoint | https://api.trycanary.ai |
10Troubleshooting
"No API token found"
Either set CANARY_API_TOKEN or run canary login first.
"Failed to start tests: 401"
Your token is invalid, expired, or revoked. Create a new API key or re-run canary login.
"No workflows found matching the filter criteria"
Check that your filters match at least one published workflow. Tags are exact matches; name patterns are case-insensitive substrings.
Tests pass in the UI but fail in CI
- Confirm your flows target the correct environment (staging vs production).
- Check that credentials are configured for the environment being tested.
- Saved credentials now warm up before workflows begin, so re-run the suite if an older session started before this improvement.
- Use
--verboseto see detailed event data.
Sandbox command runs in the wrong environment
If you expected to run inside a specific container or service, add --service <name> to canary sandbox run-command. This targets that sandbox service directly instead of the host VM. See the Sandbox reference for service names and related sandbox commands.
Sandbox command returns a clearer CLI error
Sandbox actions and sandbox CLI operations now return clearer, more consistent errors for invalid input, disabled actions, missing instances, and timeouts.
If a sandbox command fails, read the CLI message first, then verify:
- The instance ID is correct
- The action is enabled for that template or environment
- The service name passed to
--serviceexists - The sandbox is still running and reachable
11Next steps
- Open a sandbox and use Connect locally when you want to move from the sandbox UI into a richer local CLI workflow.
- Building your first smoke suite -- pick the right flows to test
- API Keys -- manage keys for your team
- CI/CD Integration -- examples for GitLab, CircleCI, and Jenkins
- Sandbox reference -- manage sandbox templates, artifacts, builds, actions, service-scoped CLI commands, and agent-friendly sandbox workflows
- Test Runs -- understand test run lifecycle and auto-verification