Stagehand & BrowserBase: Should AI Agents Adopt the act/extract/observe Pattern?
Stagehand is an open-source AI browser automation SDK from BrowserBase, with 12k+ GitHub stars and MIT license. Built on Playwright, its core innovation is abstracting AI-driven web interaction into three semantically clear verbs. BrowserBase itself is a cloud browser infrastructure platform providing remote Chrome instances, anti-detection, and concurrency scaling.
The Three Primitives
Stagehand's design philosophy is minimalist: browser interaction has only three semantic operations.
act(action) — AI understands natural language instructions, locates elements, and executes. No more CSS selectors — just say "click the login button" or "fill in the email address," and the LLM understands the page structure to act.
extract(instruction) — AI extracts structured data from the page, returning JSON. No regex matching or DOM traversal — tell the LLM "extract all product prices and names."
observe() — AI analyzes current page state, returning a list of executable actions. This is "intelligent perception" — the Agent knows what it can do on the current page.
Under the hood, Stagehand doesn't feed screenshots to the LLM. Instead, it serializes Playwright's DOM snapshot + ARIA tree and sends that to the LLM. This is more efficient, cheaper, and more precise than screenshots. For large DOMs, it supports chunking.
Competitive Landscape
| Dimension | Stagehand | agent-browser | browser-use | Playwright MCP | AgentQL |
|---|---|---|---|---|---|
| Language | TS/Node | Rust CLI | Python | TS (MCP protocol) | Python/JS |
| AI interaction | act/extract/observe | LLM-driven CLI | LLM-driven | No AI layer | AI query language |
| Engine | Playwright | CDP | Playwright | Playwright | Playwright |
| Self-hosted | Yes (local Playwright) | Yes | Yes | Yes | Partial |
| Cloud dependency | Optional BrowserBase | None | None | None | Requires AgentQL API |
| Pricing | Open source + optional paid cloud | Free | Free | Free | Paid API |
Key finding: Our existing agent-browser (Rust CLI) and browser-use-setup (Python) both lack Stagehand's "AI semantic understanding → action execution" abstraction layer. They rely on LLMs to understand the overall task, but the execution layer still uses traditional element location.
BrowserBase Cloud: Paid but Bypassable
BrowserBase cloud pricing is per browser session duration, with limited free tier. Core value: remote hosted Chrome, anti-detection fingerprints, concurrency scaling. Per our policy (paid API = auto SKIP), BrowserBase cloud is not applicable.
But the key point: Stagehand itself can run without BrowserBase cloud, using local Playwright directly. This means the three-primitive pattern can be borrowed at zero cost.
Borrowing Value Assessment
High Value: act/extract/observe design pattern
This is Stagehand's core innovation. It distills AI browser interaction from the vague "LLM understands task → tool executes" flow into three semantically clear verbs. Our existing skills all lack this "AI understanding → precise action" semantic layer.
Borrowable technical points:
- DOM snapshot + ARIA tree as LLM input — more efficient, cheaper than screenshots
- Chunking strategy for large DOMs — avoids token overflow
- Hybrid mode: AI semantic operations + Playwright/CDP precise operations — native API still available when precision is needed
- MCP server encapsulation — easy Agent invocation via standard protocol
Parts NOT to import:
- BrowserBase cloud service (paid, policy SKIP)
- Stagehand npm package itself (we can self-build the pattern)
- Dependency chain on OpenAI/Anthropic APIs (we use local LLMs)
Conclusion and Recommendation
Stagehand's three-primitive pattern is worth borrowing but not directly adopting. Reasons:
- The core pattern is essentially DOM snapshot + LLM reasoning → CDP/Playwright execution, not complex to implement
- We already have CDP skill foundations (chrome-cdp-mcp-setup, agent-browser), can self-build a semantic layer on top
- BrowserBase cloud is a paid service, auto-skipped per policy
- Directly depending on the Stagehand npm package introduces unnecessary LLM API dependency chains
Recommended approach: Build an "AI semantic operation" wrapper on top of existing chrome-cdp-mcp-setup or agent-browser skills, referencing Stagehand's three-primitive design. Implementation path: DOM snapshot → LLM analysis → CDP/Playwright execution. Zero extra cost, zero external dependencies, fully local.
Sources:
- Stagehand: https://github.com/browserbase/stagehand (MIT, 12k+ stars)
- BrowserBase SDK: https://github.com/browserbase/sdk (MIT)
- BrowserBase official: https://www.browserbase.com/
- agent-browser: Hermes Agent built-in skill
- browser-use: https://github.com/browser-use/browser-use (MIT)
- Playwright MCP: https://github.com/anthropics/playwright-mcp (MIT)
- AgentQL: https://github.com/tinyfish-io/agentql (partially paid)