Skip to main content

Choosing a Model

Testronaut supports multiple OpenAI models for agentic (tool-using) workflows.
You can pick a default during testronaut --init, store it in testronaut-config.json, and override it per run via the CLI.

Heads-up: Some newer models (e.g. GPT-5 family) may require specific account access or region availability. If you choose one that your account can’t use, Testronaut will warn and you may see 429/limit errors from the API.


Supported model families​

All of the following are compatible with Testronaut’s tool calling & DOM agent:

  • GPT-4o — Multimodal, strong tool use
  • GPT-4o mini — Faster/cheaper 4o variant
  • GPT-4.1 — General-purpose, long context
  • GPT-4.1 mini — Speed/cost optimized 4.1
  • o3 — Reasoning-oriented with native tool use
  • o4-mini — Cost-effective reasoning with tool use
  • GPT-5 family (e.g., gpt-5, gpt-5-mini, gpt-5-nano) — If available on your account. Testronaut supports selecting these; availability and limits may vary.

You can always add more choices as OpenAI releases them—Testronaut reads the model id you select and forwards it directly to the API.


How Testronaut decides which model to use​

  1. CLI override (highest precedence)
    If you pass --model <id> on the command line, that model is used for the entire run.

  2. Environment override
    If TESTRONAUT_MODEL is set, it overrides the config file.

  3. Project config
    testronaut-config.json stores { "provider": "openai", "model": "<id>" } after init.

  4. Fallback
    If none of the above are present, Testronaut defaults to gpt-4o.

The chosen model is surfaced in:

  • The CLI JSON and HTML reports (results.llm.provider & results.llm.model)
  • The companion app (header and table pills)

Set the default model (config)​

After running npx testronaut --init, you’ll have testronaut-config.json.
Edit it to set the provider/model:

testronaut-config.json
{
"initialized": true,
"provider": "openai",
"model": "gpt-4o",
"outputDir": "missions/mission_reports",
"projectName": "your-project",
"maxTurns": 20
}

Examples:

  • gpt-4o
  • gpt-4o-mini
  • gpt-4.1
  • gpt-4.1-mini
  • o3
  • o4-mini
  • gpt-5 / gpt-5-mini / gpt-5-nano (if available)

Override per run (CLI) Use --model to override for a single run:

Run all missions with GPT-4.1 mini​

npx testronaut --model gpt-4.1-mini

Run a specific mission with o3​

npx testronaut downloadDocument.mission.js --model o3

This sets the model for the whole execution and shows up in the generated reports.

Override via environment

macOS/Linux​

export TESTRONAUT_MODEL=gpt-4o-mini
npx testronaut

Windows (PowerShell)​

$env:TESTRONAUT_MODEL="gpt-4o-mini"
npx testronaut

Cost / benefit guide (quick picks)

Exact pricing and limits change; treat this as a practical guide. Choose based on the mission’s needs (speed vs. depth vs. cost).

Family/ModelStrengthsTypical UseTrade-offs
GPT-4oStrong tool use, robust reasoning, multimodalMost agentic missions; reliable DOM/tool callingMid cost/latency
GPT-4o miniFast + inexpensiveIteration, smoke tests, high-volume runsLower peak reasoning vs 4o
GPT-4.1Long context + high accuracyComplex, stateful missions, larger DOM snapshotsHigher cost
GPT-4.1 miniGood balance of speed/qualityGeneral testing with moderate complexitySlightly less capable than 4.1
o3Advanced reasoning, strong tool decisionsTricky flows requiring planning/multi-step tool orchestrationMay be slower / more costly
o4-miniCost-effective reasoningBudget-aware runs that still need planningLess depth than o3
GPT-5 (fam)Newest reasoning + tool use (where available)Cutting-edge agentic flows, “thinking” modes if enabledAvailability/rate limits vary

Recommendations​

Fast & cheap: gpt-4o-mini

Best general agent: gpt-4o

Big context / high accuracy: gpt-4.1

Reasoning-heavy flows: o3 (or o4-mini for a cheaper option)

If you have access: try gpt-5 family; keep a warning in docs/UI.

Rate limits & dynamic backoff (FYI)​

Testronaut ships with token-aware throttling:

It estimates tokens per turn and backs off when nearing per-minute token limits.

If a 429 occurs, it can learn updated limits from response headers (when provided).

New/unsupported tokenizers are handled with sensible fallbacks—so you can safely pick future model IDs.

If you select a model that your account can’t use, expect 429s or capability errors; switch to gpt-4o or gpt-4o-mini.

Troubleshooting​

“Invalid model: xyz”

Ensure the model id is spelled exactly as the API expects.

If it’s a new family (e.g., GPT-5) and your tokenizer library doesn’t recognize it, Testronaut still estimates tokens via fallbacks—runs will continue.

“429: rate limit exceeded”

Pick a smaller/faster model for bulk runs (e.g., gpt-4o-mini), or reduce concurrency.

Try again later or contact OpenAI about quota.

Consider setting TESTRONAUT_MODEL to a lighter model for large suites.

Examples Set default once, override ad-hoc:

Configure default to 4o in testronaut-config.json​

npx testronaut --init

Temporary override to o3 for a tricky mission​

npx testronaut checkoutFlow.mission.js --model o3

Project-wide switch via ENV (CI friendly):​

TESTRONAUT_MODEL=gpt-4.1-mini npx testronaut

Reports

The JSON & HTML reports show llm.provider and llm.model.

The companion app displays these as header text and colorful pills in the table.