Choosing a Model
Testronaut supports multiple OpenAI models for agentic (tool-using) workflows.
You can pick a default during testronaut --init
, store it in testronaut-config.json
, and override it per run via the CLI.
Heads-up: Some newer models (e.g. GPT-5 family) may require specific account access or region availability. If you choose one that your account can’t use, Testronaut will warn and you may see 429/limit errors from the API.
Supported model families​
All of the following are compatible with Testronaut’s tool calling & DOM agent:
- GPT-4o — Multimodal, strong tool use
- GPT-4o mini — Faster/cheaper 4o variant
- GPT-4.1 — General-purpose, long context
- GPT-4.1 mini — Speed/cost optimized 4.1
- o3 — Reasoning-oriented with native tool use
- o4-mini — Cost-effective reasoning with tool use
- GPT-5 family (e.g.,
gpt-5
,gpt-5-mini
,gpt-5-nano
) — If available on your account. Testronaut supports selecting these; availability and limits may vary.
You can always add more choices as OpenAI releases them—Testronaut reads the model id you select and forwards it directly to the API.
How Testronaut decides which model to use​
-
CLI override (highest precedence)
If you pass--model <id>
on the command line, that model is used for the entire run. -
Environment override
IfTESTRONAUT_MODEL
is set, it overrides the config file. -
Project config
testronaut-config.json
stores{ "provider": "openai", "model": "<id>" }
after init. -
Fallback
If none of the above are present, Testronaut defaults togpt-4o
.
The chosen model is surfaced in:
- The CLI JSON and HTML reports (
results.llm.provider
&results.llm.model
) - The companion app (header and table pills)
Set the default model (config)​
After running npx testronaut --init
, you’ll have testronaut-config.json
.
Edit it to set the provider/model:
{
"initialized": true,
"provider": "openai",
"model": "gpt-4o",
"outputDir": "missions/mission_reports",
"projectName": "your-project",
"maxTurns": 20
}
Examples:
- gpt-4o
- gpt-4o-mini
- gpt-4.1
- gpt-4.1-mini
- o3
- o4-mini
- gpt-5 / gpt-5-mini / gpt-5-nano (if available)
Override per run (CLI) Use --model to override for a single run:
Run all missions with GPT-4.1 mini​
npx testronaut --model gpt-4.1-mini
Run a specific mission with o3​
npx testronaut downloadDocument.mission.js --model o3
This sets the model for the whole execution and shows up in the generated reports.
Override via environment
macOS/Linux​
export TESTRONAUT_MODEL=gpt-4o-mini
npx testronaut
Windows (PowerShell)​
$env:TESTRONAUT_MODEL="gpt-4o-mini"
npx testronaut
Cost / benefit guide (quick picks)
Exact pricing and limits change; treat this as a practical guide. Choose based on the mission’s needs (speed vs. depth vs. cost).
Family/Model | Strengths | Typical Use | Trade-offs |
---|---|---|---|
GPT-4o | Strong tool use, robust reasoning, multimodal | Most agentic missions; reliable DOM/tool calling | Mid cost/latency |
GPT-4o mini | Fast + inexpensive | Iteration, smoke tests, high-volume runs | Lower peak reasoning vs 4o |
GPT-4.1 | Long context + high accuracy | Complex, stateful missions, larger DOM snapshots | Higher cost |
GPT-4.1 mini | Good balance of speed/quality | General testing with moderate complexity | Slightly less capable than 4.1 |
o3 | Advanced reasoning, strong tool decisions | Tricky flows requiring planning/multi-step tool orchestration | May be slower / more costly |
o4-mini | Cost-effective reasoning | Budget-aware runs that still need planning | Less depth than o3 |
GPT-5 (fam) | Newest reasoning + tool use (where available) | Cutting-edge agentic flows, “thinking” modes if enabled | Availability/rate limits vary |
Recommendations​
Fast & cheap: gpt-4o-mini
Best general agent: gpt-4o
Big context / high accuracy: gpt-4.1
Reasoning-heavy flows: o3 (or o4-mini for a cheaper option)
If you have access: try gpt-5 family; keep a warning in docs/UI.
Rate limits & dynamic backoff (FYI)​
Testronaut ships with token-aware throttling:
It estimates tokens per turn and backs off when nearing per-minute token limits.
If a 429 occurs, it can learn updated limits from response headers (when provided).
New/unsupported tokenizers are handled with sensible fallbacks—so you can safely pick future model IDs.
If you select a model that your account can’t use, expect 429s or capability errors; switch to gpt-4o or gpt-4o-mini.
Troubleshooting​
“Invalid model: xyz”
Ensure the model id is spelled exactly as the API expects.
If it’s a new family (e.g., GPT-5) and your tokenizer library doesn’t recognize it, Testronaut still estimates tokens via fallbacks—runs will continue.
“429: rate limit exceeded”
Pick a smaller/faster model for bulk runs (e.g., gpt-4o-mini), or reduce concurrency.
Try again later or contact OpenAI about quota.
Consider setting TESTRONAUT_MODEL to a lighter model for large suites.
Examples Set default once, override ad-hoc:
Configure default to 4o in testronaut-config.json​
npx testronaut --init
Temporary override to o3 for a tricky mission​
npx testronaut checkoutFlow.mission.js --model o3
Project-wide switch via ENV (CI friendly):​
TESTRONAUT_MODEL=gpt-4.1-mini npx testronaut
Reports
The JSON & HTML reports show llm.provider and llm.model.
The companion app displays these as header text and colorful pills in the table.