Choosing a Model

Testronaut supports multiple OpenAI models for agentic (tool-using) workflows.
You can pick a default during testronaut --init, store it in testronaut-config.json, and override it per run via the CLI.

Heads-up: Some newer models (e.g. GPT-5 family) may require specific account access or region availability. If you choose one that your account can’t use, Testronaut will warn and you may see 429/limit errors from the API.

Supported model families

All of the following are compatible with Testronaut’s tool calling & DOM agent:

GPT-4o — Multimodal, strong tool use
GPT-4o mini — Faster/cheaper 4o variant
GPT-4.1 — General-purpose, long context
GPT-4.1 mini — Speed/cost optimized 4.1
o3 — Reasoning-oriented with native tool use
o4-mini — Cost-effective reasoning with tool use
GPT-5 family (e.g., gpt-5, gpt-5-mini, gpt-5-nano) — If available on your account. Testronaut supports selecting these; availability and limits may vary.

You can always add more choices as OpenAI releases them—Testronaut reads the model id you select and forwards it directly to the API.

How Testronaut decides which model to use

CLI override (highest precedence)
If you pass --model <id> on the command line, that model is used for the entire run.
Environment override
If TESTRONAUT_MODEL is set, it overrides the config file.
Project config
testronaut-config.json stores { "provider": "openai", "model": "<id>" } after init.
Fallback
If none of the above are present, Testronaut defaults to gpt-4o.

The chosen model is surfaced in:

The CLI JSON and HTML reports (results.llm.provider & results.llm.model)
The companion app (header and table pills)

Set the default model (config)

After running npx testronaut --init, you’ll have testronaut-config.json.
Edit it to set the provider/model:

testronaut-config.json
{
  "initialized": true,
  "provider": "openai",
  "model": "gpt-4o",
  "outputDir": "missions/mission_reports",
  "projectName": "your-project",
  "maxTurns": 20
}

Examples:

gpt-4o
gpt-4o-mini
gpt-4.1
gpt-4.1-mini
o3
o4-mini
gpt-5 / gpt-5-mini / gpt-5-nano (if available)

Override per run (CLI) Use --model to override for a single run:

Run all missions with GPT-4.1 mini

npx testronaut --model gpt-4.1-mini

Run a specific mission with o3

npx testronaut downloadDocument.mission.js --model o3

This sets the model for the whole execution and shows up in the generated reports.

Override via environment

macOS/Linux

export TESTRONAUT_MODEL=gpt-4o-mini
npx testronaut

Windows (PowerShell)

$env:TESTRONAUT_MODEL="gpt-4o-mini"
npx testronaut

Cost / benefit guide (quick picks)

Exact pricing and limits change; treat this as a practical guide. Choose based on the mission’s needs (speed vs. depth vs. cost).

Family/Model	Strengths	Typical Use	Trade-offs
GPT-4o	Strong tool use, robust reasoning, multimodal	Most agentic missions; reliable DOM/tool calling	Mid cost/latency
GPT-4o mini	Fast + inexpensive	Iteration, smoke tests, high-volume runs	Lower peak reasoning vs 4o
GPT-4.1	Long context + high accuracy	Complex, stateful missions, larger DOM snapshots	Higher cost
GPT-4.1 mini	Good balance of speed/quality	General testing with moderate complexity	Slightly less capable than 4.1
o3	Advanced reasoning, strong tool decisions	Tricky flows requiring planning/multi-step tool orchestration	May be slower / more costly
o4-mini	Cost-effective reasoning	Budget-aware runs that still need planning	Less depth than o3
GPT-5 (fam)	Newest reasoning + tool use (where available)	Cutting-edge agentic flows, “thinking” modes if enabled	Availability/rate limits vary

Recommendations

Fast & cheap: gpt-4o-mini

Best general agent: gpt-4o

Big context / high accuracy: gpt-4.1

Reasoning-heavy flows: o3 (or o4-mini for a cheaper option)

If you have access: try gpt-5 family; keep a warning in docs/UI.

Rate limits & dynamic backoff (FYI)

Testronaut ships with token-aware throttling:

It estimates tokens per turn and backs off when nearing per-minute token limits.

If a 429 occurs, it can learn updated limits from response headers (when provided).

New/unsupported tokenizers are handled with sensible fallbacks—so you can safely pick future model IDs.

If you select a model that your account can’t use, expect 429s or capability errors; switch to gpt-4o or gpt-4o-mini.

Troubleshooting

“Invalid model: xyz”

Ensure the model id is spelled exactly as the API expects.

If it’s a new family (e.g., GPT-5) and your tokenizer library doesn’t recognize it, Testronaut still estimates tokens via fallbacks—runs will continue.

“429: rate limit exceeded”

Pick a smaller/faster model for bulk runs (e.g., gpt-4o-mini), or reduce concurrency.

Try again later or contact OpenAI about quota.

Consider setting TESTRONAUT_MODEL to a lighter model for large suites.

Examples Set default once, override ad-hoc:

Configure default to 4o in testronaut-config.json

npx testronaut --init

Temporary override to o3 for a tricky mission

npx testronaut checkoutFlow.mission.js --model o3

Project-wide switch via ENV (CI friendly):

TESTRONAUT_MODEL=gpt-4.1-mini npx testronaut

Reports

The JSON & HTML reports show llm.provider and llm.model.

The companion app displays these as header text and colorful pills in the table.

Supported model families​

How Testronaut decides which model to use​

Set the default model (config)​

Run all missions with GPT-4.1 mini​

Run a specific mission with o3​

macOS/Linux​

Windows (PowerShell)​