Run evaluations from the CLI
Create, list, and inspect evaluations from your terminal.
List and inspect
Browse evaluations on your active team. Add --json on any subcommand for machine-readable output.
autousers eval list
autousers eval list --jsonHuman-readable table or JSON for scripting
autousers eval get <id>Fetch one evaluation including config, comparisons, and shares
Create an evaluation
There are two creation paths: a flag-driven one-liner for scripts, and an interactive wizard for first-time setup.
Flag-driven (scriptable)
autousers eval create \
--type SSE \
--url https://example.com \
--title "Homepage SSE — May 1"Single-sided evaluation
autousers eval create \
--type SxS \
--url-a https://v1.example.com \
--url-b https://v2.example.com \
--title "Homepage redesign A/B"Side-by-side comparison
Interactive wizard
Run autousers eval create with no flags to launch a step-by-step wizard. It walks you through type (SSE / SxS), URLs, autouser selection, and dimensions, then confirms before persisting.
autousers eval createWizard mode — best on first run
Run an evaluation
autousers eval run dispatches autouser runs against an existing evaluation. Dry-run is on by default — the first invocation always returns a cost estimate, never charges Gemini tokens.
autousers eval run <id>Default = dry-run preview. Shows token + free-pool cost
autousers eval run <id> --commitConfirm the preview and queue real runs
Get results
autousers eval results <id>Aggregate stats, per-rater summaries, and inter-rater agreement
autousers eval results <id> --json | jq '.summary'Pipe JSON output through jq for ad-hoc analysis
Export
Pull ratings out as JSON or CSV for downstream BI tools or spreadsheets.
autousers eval export <id> --format csv > results.csv
autousers eval export <id> --format json > results.jsonCSV for Sheets/Excel, JSON for everything else