How to debug an AI browser-agent failure

When a browser agent fails, logs usually show tool calls and stack traces. They rarely show what the model saw, which page state changed, or where the first wrong assumption entered the run.

The failure pattern

The hard browser-agent bugs are often visual or stateful. The selector matched the wrong element. A modal changed the page. The model assumed the article layout was stable. The button existed but was disabled. By the time the exception appears, the browser state that explains it is gone.

That is why browser-agent debugging needs more than LLM call logs. You need a step timeline with screenshots, URLs, actions, model input, model output, status, and the first failing step.

Run a local trace

uvx --from "browsertrace[ui] @ git+https://github.com/aaronlab/browsertrace@v0.1.14" browsertrace doctor
uvx --from "browsertrace[ui] @ git+https://github.com/aaronlab/browsertrace@v0.1.14" browsertrace demo
uvx --from "browsertrace[ui] @ git+https://github.com/aaronlab/browsertrace@v0.1.14" browsertrace

Open http://127.0.0.1:3000, then click the failed checkout-agent run. You can also inspect the live exported trace without installing anything.

Inspect the first red step

  1. Does the screenshot match what the model thought it saw?
  2. Does the URL match the step's assumption?
  3. Did the action target the expected element?
  4. Did the model output mention a selector, label, or page state that was not true?
  5. Was the current step wrong, or did an earlier step set up the wrong state?

This finds the bug faster than reading a run log from the top, because it starts where the browser state and the model decision disagree.

Record your own agent

from browsertrace import trace

@trace
def my_agent(run, query: str):
    run.step(action=f"search: {query}", screenshot=...)
    run.step(action="click first result", screenshot=...)

my_agent("browser agent debugging")

For Playwright, use run.snapshot(page, action="...") to capture the current URL and screenshot without extra boilerplate.

Share a failure

browsertrace list
browsertrace export <run_id> --redact -o public.html
browsertrace export <run_id> --public -o public.html

The export is a self-contained HTML file. Use --public before posting a real trace publicly to omit prompt/model I/O, screenshots, and URLs. Use individual redaction flags when you want to keep some fields visible.

BrowserTrace is an MIT-licensed local flight recorder for Browser Use, Stagehand, Playwright + LLM scripts, Skyvern, and custom computer-use agents.
github.com/aaronlab/browsertrace