gstack: Turn Claude Code Into a Virtual Engineering Team

Garry Tan, the President and CEO of Y Combinator, has been building products for twenty years. But right now, he says, he's shipping more than he ever has — three production services and over forty features in just sixty days, all while running YC full-time. The secret weapon behind this acceleration is gstack, an open source toolkit that turns Claude Code from a single-agent copilot into a full virtual engineering team.

What Is gstack?

gstack is a set of twenty-three opinionated slash-command skills — CEO, Eng Manager, Designer, QA Lead, Security Officer, Release Engineer, and more — executed inside Claude Code sessions. Each skill is a Markdown file with embedded workflows and decision heuristics. All of them are free, MIT-licensed, and install in about thirty seconds.

The numbers behind gstack are staggering: 87,201 GitHub stars as of April 2026, more than 12,800 forks, and 53 distinct SKILL.md files powering the system. Garry Tan claims that his normalized 2026 productivity runs at 810× his 2013 pace (11,417 vs. 14 logical lines per day), measured across forty public and private repositories.

Garry's own framing is blunt:

"This is my open source software factory. I use it every day. I'm sharing it because these tools should be available to everyone."

The Skill Sprint: Think → Plan → Build → Review → Test → Ship → Reflect

gstack is not a bag of disconnected tools. It is a process — skills feed into each other in the order a sprint runs:

Phase	Skills	What happens
Think	`/office-hours`, `/plan-ceo-review`	The agent challenges your framing, refutes assumptions, and writes a design doc
Plan	`/plan-eng-review`, `/plan-design-review`, `/plan-devex-review`	Architecture diagrams, design audits (scored 0–10), and 20–45 developer-experience forcing questions
Build	`/autoplan` (auto-runs CEO → Design → Eng → DX reviews)	Implementation guided by approved plans; continuous checkpoint commits for crash recovery
Review	`/review`, `/codex`	Staff-engineer review finds production bugs; Codex provides a cross-model second opinion
Test	`/qa`, `/qa-only`, `/canary`, `/benchmark`	Real Chromium browser testing, post-deploy error monitoring, and before/after Core Web Vitals
Ship	`/ship`, `/land-and-deploy`	Syncs main, runs tests, opens PR, merges after CI, and verifies production health — one command
Reflect	`/retro`, `/document-release`, `/learn`	Weekly team-aware retrospectives, auto-updated docs, and cross-session learnings that compound

Every step knows what came before. /office-hours writes a design doc that /plan-ceo-review reads. /plan-eng-review writes a test plan that /qa picks up. Nothing falls through the cracks.

The Browser Component

Several gstack skills depend on a real Chromium browser, so the toolkit ships the GStack Browser — a persistent, long-lived daemon that the CLI talks to over localhost HTTP.

The architecture is elegant:

Claude Code                    gstack
─────────                    ────────
Tool call: $B snapshot -i     CLI (compiled binary)
────────────────────────→     • reads state file
                               • POST /command to localhost

                               Server (Bun.serve)
                               • dispatches commands
                               • talks to Chromium via CDP

                               Chromium (headless)
                               • persistent tabs and cookies
                               • 30-minute idle timeout

First call starts everything (~3 seconds). Every call after that: ~100–200 milliseconds. This sub-second latency makes an interactive QA session with twenty or more commands practical, whereas cold-starting Playwright per command would add forty-plus seconds of overhead.

The browser is built with Bun — compiled into a single ~58MB binary with zero runtime dependencies. Native SQLite (for cookie decryption), native TypeScript, and Bun.serve() keep the stack lean. The bottleneck is always Chromium; the CLI and server are never the limiting factor.

AI System Integration: Beyond Claude Code

This is where gstack becomes more than a Claude Code power-up. The toolkit is designed to be agent-agnostic. Its setup script auto-detects installed AI coding agents and provisions skills accordingly:

AI Agent	Install flag	Skills destination
Claude Code	(default)	`~/.claude/skills/gstack-*/`
OpenAI Codex CLI	`--host codex`	`~/.codex/skills/gstack-*/`
OpenCode	`--host opencode`	`~/.config/opencode/skills/gstack-*/`
Cursor	`--host cursor`	`~/.cursor/skills/gstack-*/`
Factory Droid	`--host factory`	`~/.factory/skills/gstack-*/`
Kiro	`--host kiro`	`~/.kiro/skills/gstack-*/`
Hermes	`--host hermes`	`~/.hermes/skills/gstack-*/`
GBrain	`--host gbrain`	`~/.gbrain/skills/gstack-*/`

This means the twenty-three-specialist virtual team isn't locked to Anthropic's ecosystem. You can run /cso (Chief Security Officer — OWASP Top 10 + STRIDE threat modeling) through Codex. You can run /qa (real browser testing) through Hermes. Each agent sees the same Markdown skills, the same workflows, and the same decision heuristics.

Adding support for a new agent is exactly one TypeScript config file — zero code changes.

OpenClaw Integration

gstack works particularly well with OpenClaw (Peter Steinberger's 247k-star AI agent orchestration system). Four gstack methodology skills — office-hours, ceo-review, investigate, and retro — ship as native OpenClaw skills via ClawHub:

clawhub install gstack-openclaw-office-hours gstack-openclaw-ceo-review \
                gstack-openclaw-investigate gstack-openclaw-retro

These are conversational skills. Your OpenClaw agent runs them directly via chat — no Claude Code session required. For heavier work, OpenClaw dispatches Claude Code sessions with gstack preloaded, following routing rules that distinguish simple fixes from full-on feature builds.

The Learning Loop

gstack gets smarter on your codebase over time. The /learn skill manages what the toolkit has learned across sessions: patterns, pitfalls, preferences, and project-specific heuristics. These learnings compound session over session — the longer you use gstack on a project, the better it understands your architecture, your testing philosophy, and your taste.

Continuous checkpoint mode (opt-in) auto-commits WIP: snapshots as you go. Crashing or context-switching doesn't lose state. /ship filter-squashes all WIP commits before the PR so bisect stays clean.

The Builder Ethos

The most interesting part of gstack isn't the code — it's the builder philosophy injected automatically into every skill's preamble. Three principles stand out:

1. Boil the Lake. AI-assisted coding makes the marginal cost of completeness near-zero. When the full implementation costs minutes more than the shortcut, do the complete thing — 100% test coverage, all edge cases, every error path. "Ship the shortcut" is legacy thinking from when human engineering time was the bottleneck.

2. Search Before Building. Before building anything involving unfamiliar patterns, the agent stops and searches first. Three layers of knowledge: (1) tried-and-true patterns already in distribution, (2) new-and-popular blog posts and trends (inputs to thinking, not answers), and (3) first-principles observations — the most valuable of all, the things that are genuinely out of distribution.

3. Completeness Is Cheap. When evaluating "approach A (full, ~150 lines) vs. approach B (90%, ~80 lines)," always prefer A. The 70-line delta costs seconds with AI coding.

This ethos explains the compression ratios gstack users report: 100× for boilerplate and scaffolding, 50× for test writing, 30× for feature implementation, 20× for bug fixes with regression tests, and 3–5× for architecture and research tasks.

Security: Seven Layers of Prompt Injection Defense

A tool that reads hostile web pages and executes shell commands through an AI agent demands serious security. gstack's defense is layered, not single-point:

L1–L3 content security: Datamarking, hidden-element stripping, ARIA regex filtering, URL blocklists, and trust-boundary envelope wrapping on every page-content command and tool output.
L4 ML classifier: A 22MB BERT-small ONNX model (int8 quantized) bundled in-process, scanning every user message and tool output before the model sees it. Runs locally, no network calls.
L4b transcript classifier: A Claude Haiku pass that looks at full conversation shape, not just individual text snippets. Gated so most clean traffic skips the paid call.
L5 canary token: A random token injected into the system prompt at session start. Rolling-buffer detection catches the token if it ever appears in the model's output, tool arguments, or file writes — deterministic BLOCK.
L6 ensemble combiner: BLOCK requires agreement from two ML classifiers at >= WARN level, not a single confident hit. This mitigates false positives on legitimate instruction-writing traffic.
Physical port separation: When the browser daemon tunnels externally for /pair-agent, it runs two HTTP listeners — a local listener (full surface, never forwarded) and a tunnel listener (locked allowlist, scoped tokens). Port separation means a tunnel caller physically cannot reach /health, /cookie-picker, or token-minting endpoints.
Shell injection prevention: All browser registries are hardcoded. Database paths are constructed from constants, never user input. Keychain access uses explicit argument arrays, not shell string interpolation.

This is not a checklist — it is an architecture designed around the assumption that the agent will encounter hostile input.

How to Use gstack in Your AI System

If you're integrating gstack into an existing AI workflow, here's the recommended path:

Install (thirty seconds): Clone the repo and run ./setup. For Hermes users, pass --host hermes.
Start with /office-hours: Describe what you're building. Let the agent reframe the problem before writing a single line of code.
Run /autoplan on any feature idea: This auto-runs CEO → Design → Eng → DX review, surfaces only taste decisions for your approval.
For every branch with changes, run /review or /codex (cross-model second opinion from Codex).
For every staging URL, run /qa: Real browser testing catches the bugs static analysis misses.
For every production deploy, run /ship then /land-and-deploy: One command from approved PR to verified production.
End every week with /retro: Team-aware analysis of velocity, test health, and growth opportunities.
Chain with GBrain: If you set up /setup-gbrain (PGLite or Supabase, ~5 minutes), gstack stores project learnings across sessions — it gets smarter on your codebase over time.

The key insight: gstack is a process, not a collection of tools. You can cherry-pick individual skills — /cso for security audits, /qa for browser testing — but the full acceleration comes from running the sprint end to end. Each skill feeds into the next. Nothing falls through the cracks because every step knows what came before.

What This Means

gstack represents a philosophy shift in how we think about AI coding tools. It doesn't treat the agent as a faster typist. It treats the agent as a team — with roles, responsibilities, review gates, and a shared process. The difference between "write me a feature" (copilot) and "run /autoplan, implement the plan, then run /ship" (team) is the difference between generating code and shipping software.

A single person with the right tooling can now move faster than a traditional team. Garry Tan is 810× more productive than 2013, Andrej Karpathy says he hasn't typed a line of code since December 2025, and Peter Steinberger built OpenClaw — 247k GitHub stars — essentially solo with AI agents. This is the era Garry describes as the Golden Age: the engineering barrier is gone; what remains is taste, judgment, and the willingness to do the complete thing.

gstack is free, MIT-licensed, and available on GitHub. Fork it. Improve it. Make it yours.