play_arrow

Run evaluations from the CLI

schedule5 mincalendar_todayLast updated 2026-05-01

Create, list, and inspect evaluations from your terminal.

List and inspect

Browse evaluations on your active team. Add --json on any subcommand for machine-readable output.

autousers eval list
autousers eval list --json

Human-readable table or JSON for scripting

autousers eval get <id>

Fetch one evaluation including config, comparisons, and shares

Create an evaluation

There are two creation paths: a flag-driven one-liner for scripts, and an interactive wizard for first-time setup.

Flag-driven (scriptable)

autousers eval create \
  --type SSE \
  --url https://example.com \
  --title "Homepage SSE — May 1"

Single-sided evaluation

autousers eval create \
  --type SxS \
  --url-a https://v1.example.com \
  --url-b https://v2.example.com \
  --title "Homepage redesign A/B"

Side-by-side comparison

Interactive wizard

Run autousers eval create with no flags to launch a step-by-step wizard. It walks you through type (SSE / SxS), URLs, autouser selection, and dimensions, then confirms before persisting.

autousers eval create

Wizard mode — best on first run

Run an evaluation

autousers eval run dispatches autouser runs against an existing evaluation. Dry-run is on by default — the first invocation always returns a cost estimate, never charges Gemini tokens.

autousers eval run <id>

Default = dry-run preview. Shows token + free-pool cost

autousers eval run <id> --commit

Confirm the preview and queue real runs

shield

Same dryRun-first rule as MCP

The CLI enforces the same cost-preview pattern the MCP server does: nothing that spends Gemini tokens runs without an explicit --commit (or a typed-yes confirmation in interactive mode). This keeps you from accidentally burning your free-run pool on a typo.

Get results

autousers eval results <id>

Aggregate stats, per-rater summaries, and inter-rater agreement

autousers eval results <id> --json | jq '.summary'

Pipe JSON output through jq for ad-hoc analysis

Export

Pull ratings out as JSON or CSV for downstream BI tools or spreadsheets.

autousers eval export <id> --format csv > results.csv
autousers eval export <id> --format json > results.json

CSV for Sheets/Excel, JSON for everything else

infoPrefer chatting with Claude or another assistant over running CLI commands? See the MCP Overview — every CLI subcommand has an equivalent MCP tool, and the Skill teaches the agent the same dryRun-first rule the CLI enforces.

Was this article helpful?

Run evaluations from the CLI

List and inspect

Create an evaluation

Flag-driven (scriptable)

Interactive wizard

Run an evaluation

Get results

Export

Related Articles

CLI command reference

Install the CLI

Authenticate the CLI

MCP Overview

The Autousers Agent Skill