aiGalen Guan

Skill Graphs 2.0: From Dependency Chains to Atoms, Molecules, and Compounds

On April 23, 2026, Shiv Sakhuja — founder of Gooseworks AI and Y Combinator alum — published an X Article titled "Skill Graphs 2.0" (original tweet, X Article requires login). It quickly gained traction: 28 replies, 105 retweets, 903 likes, and 2,762 bookmarks. The premise resonated with anyone who has tried to build reliable AI agents: how do you compose skills so that complex workflows don't collapse under their own weight?

The original article presents a clean, three-tier mental model for skill composition. This post unpacks it in depth, adds context from the agent engineering community, and connects it to practical implementation.


The Problem: Skill Graphs Break at Scale

The first version of the skill graph idea was straightforward: create a graph of skills by linking dependent skills in markdown files, similar to how you might link notes in Obsidian. A skill encodes knowledge + process into a markdown file plus optional scripts that an agent can run repeatably.

This works beautifully in theory. When you try to encode larger processes or job functions, you naturally have skills that depend on other skills. A skill to draft a marketing email might depend on a graphic design skill. Link them up, and the agent follows the chain.

But two problems emerge as the graph grows:

1. Reliability Drops with Depth

If Skill A explicitly instructs the agent to call Skill B, it works — most of the time. But in a dense graph (think Wikipedia-scale link density), the dependency chains can get enormously deep. You can no longer predict what the agent will actually do. Each hop adds non-determinism.

A human operator with a specific intent is now confronted with too much non-determinism. They're handing off judgment to the agent — maybe too much.

2. Circular Dependencies

In a real graph, circular references are inevitable. Skill A calls Skill B, which calls Skill C, which calls Skill A. The agent loops, burns tokens, and produces garbage.

These issues have been reported widely on Reddit and X by practitioners who tried skill graphs in production and hit reliability ceilings.


The Leap: Atoms, Molecules, and Compounds

Sakhuja's key insight is that not all skills are created equal. Skills operate at different abstraction levels, and treating them as peers in a flat graph is what causes the reliability collapse.

He proposes three tiers:

ATOMS — Atomic Skills

These are base-level, single-purpose building blocks. Narrow in scope. Primitives.

Examples:

  • Scrape LinkedIn profiles
  • Find a competitor's blog posts
  • Find a person on Apollo
  • Verify an email with Hunter
  • Check email deliverability
  • Research a topic
  • Review a pull request

Atoms should be super reliable — almost deterministic (or as close as you can get with an LLM). They typically don't call other skills at all. Think of them as functions that take input and produce output, with minimal branching logic.

MOLECULES — Composed Workflows

Molecules solve larger, scoped problems by combining 2–10 atomic skills. They come in two flavors:

Flavor 1: Structured Workflows — A fixed chain of atoms with explicit instructions on when and how each atom is called.

Find leads using atom-1 and atom-2 → qualify them using atom-3 → enrich using atom-4 → add them to my spreadsheet with atom-5.

The composition logic is baked into the skill itself. The agent's runtime decision-making is minimized.

Flavor 2: Orchestrators — The skill knows about 5 atoms and will use its judgment to compose them based on the prompt.

The agent has more autonomy here, but the skill still provides explicit guidance on when to use which atom.

In both cases, molecules push as much composition as possible into the skill definition, minimizing the agent's need to improvise at runtime. Molecules should also be very reliable — if their constituent atoms are reliable.

COMPOUNDS — High-Level Playbooks

Compounds are orchestrators that run multiple molecules. This is where you give the agent meaningful autonomy.

  • "Run the outbound sales playbook."
  • "Plan and build this feature, then review and QA it."

These are less deterministic by nature because there are so many judgment calls at every level. They're also the trickiest to get right — and Sakhuja explicitly acknowledges that a human probably needs to drive compounds, at least today.


The Leverage Argument: Brain RAM

This is where the framework gets really interesting. Each level is approximately an order of magnitude of leverage higher than the one below. If you're driving compounds instead of atoms, you can do roughly 100× more work with the same cognitive effort.

The reasoning hinges on brain RAM — your ability to hold multiple tasks in working memory and context-switch effectively. This is the limiting resource, not the agent's capacity.

Consider: your brain can effectively context-switch between up to 5 agents in parallel. Now suppose:

  • 1 compound orchestrates 10 molecules
  • 1 molecule orchestrates 10 atoms (reliably)
What You Drive Agents Molecular Tasks Atomic Units of Work
5 atomic tasks 5 5
5 molecular tasks 5 5 50
5 compound tasks 5 50 500

For the same brain RAM and time, work output varies massively depending on which level you operate at. Driving atomic work clogs up your RAM slots with low-leverage, nearly-deterministic tasks. Why sit in the driver's seat when the car has full self-driving?

The analogy to organizational management is precise: a CTO of a 1,000-person company doesn't fix every bug. They trust ICs to do that work reliably. The CTO manages at the compound level.


Where This Still Breaks

Sakhuja is honest about the limits:

  1. Every atom has to be solid. A broken atom cascades failure upward through molecules and compounds.
  2. Molecules have to chain atoms dependably. The orchestration logic must be proven, not aspirational.
  3. The agent needs enough autonomy at the compound level to make real decisions — but not so much that it goes off the rails.
  4. Compounds spanning more than 8–10 molecules likely hit their own reliability ceiling.

He hasn't hit the upper bound yet: "I'm still driving molecules and compounds, and even that does not feel trivial to get right. But the goal is to keep moving up to higher levels for every workstream."

The big unsolved challenge: testing skills at every level takes a lot of time, and reliability/consistency is non-trivial to achieve. He speculates that an "autoresearch" type solution might help, but hasn't tried it yet.


Gooseworks' Implementation

In the Gooseworks skills library (108 skills and counting), they've mapped these tiers to their own terminology:

Concept Level Gooseworks Name Count Examples
Atoms Capabilities 51 LinkedIn scraper, Apollo search, PR reviewer
Molecules Composites 52 Structured lead enrichment pipeline, multi-source research workflow
Compounds Playbooks 5 Outbound sales playbook, feature development lifecycle

This naming alignment is useful for anyone building their own skills library — you can adopt either the chemistry metaphor (atoms/molecules/compounds) or the Gooseworks names (capabilities/composites/playbooks), but the underlying structure is the same. The full skills library is openly available on GitHub.


Connecting to the Broader Agent Architecture Landscape

Skill Graphs 2.0 isn't an island. It connects to several active threads in AI agent engineering:

Skill Composition vs. Chain-of-Thought

In traditional CoT prompting, the LLM decides at inference time which steps to take. Skill composition pre-bakes those decisions into the skill graph. The tradeoff: less flexibility, more reliability. For production workflows, this is usually the right trade.

Hierarchical Task Decomposition

The atoms/molecules/compounds pattern mirrors how human organizations function. A CEO sets strategy (compound), directors manage departments (molecule), ICs execute tasks (atom). The key insight is that each level should be independently verifiable — you shouldn't need to trace down to the atom level to validate a compound's output.

The Observation Problem

If compounds require a human driver, we're still in the "copilot" regime, not the "autopilot" one. The skill graph doesn't eliminate the need for human oversight — it just moves the oversight to a higher abstraction level. This aligns with Sequoia's Julien Bek's "Services: The New Software" thesis that the path to trillion-dollar AI companies runs through services that still have a human in the loop, but at a much higher leverage point.


Practical Takeaways

If you're building an AI agent system, here's how to apply Skill Graphs 2.0:

  1. Start with atoms. Get them rock-solid. Test obsessively. If your atoms fail, everything above them fails.
  2. Write molecules as explicit as possible. Favor structured workflows (Flavor 1) over open-ended orchestrators (Flavor 2) until you have high confidence in your atoms.
  3. Drive at the compound level. Your human attention is the scarce resource. Spend it on high-level orchestration, not on micromanaging atoms.
  4. Watch the depth. If a compound orchestrates more than 8–10 molecules, consider splitting it or adding intermediate checkpoints.
  5. Test at every level. Atom reliability enables molecule reliability enables compound reliability. Test the stack, not just the top.
  6. Name your levels. Whether you use atoms/molecules/compounds or capabilities/composites/playbooks, consistent naming makes the system thinkable.

Conclusion

Skill Graphs 2.0 is a deceptively simple framework with deep implications. The shift from "link everything in a graph" to "stratify by abstraction level and reliability requirement" solves the real problem practitioners face: depth kills reliability, but flat lists kill leverage.

The three-tier model (atoms → molecules → compounds) gives you both: reliability at the base through deterministic atomic skills, and leverage at the top through compounds that free your brain RAM for higher-order work.

The unsolved frontier is automating the testing and validation of skills across all three levels — and until that's solved, the human remains in the loop. But the loop is now at 10,000 feet instead of 100.


Sources

  • Shiv Sakhuja, "Skill Graphs 2.0" — X Article (April 23, 2026). Original tweet (X Article content requires login)
  • Gooseworks AI — Shiv Sakhuja's company, building AI Coworkers with a skills-based approach
  • goose-skills repository — 108 skills organized as Capabilities (51) / Composites (52) / Playbooks (5)